Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for aggregate plots in Batch mode #237

Merged
merged 1 commit into from
Aug 4, 2022

Conversation

Colelyman
Copy link
Contributor

There was a bug that I introduced in CRISPRessoBatch when the sgRNA intervals are equal. This commit fixes it!

@kclem kclem merged commit 4ba524d into pinellolab:master Aug 4, 2022
@kclem kclem deleted the batch-aggregate-fix branch August 4, 2022 17:32
mbowcut2 added a commit to edilytics/CRISPResso2 that referenced this pull request Feb 15, 2024
commit c909ea3b34e87ce637e00dac075d2bb2f8bfb954
Author: McKay <[email protected]>
Date:   Thu Feb 15 15:55:23 2024 -0700

    added plotly dependency for pro

commit 76b3601f6a0144f100266153f1c999e0c5de65de
Author: Samuel Nichols <[email protected]>
Date:   Fri Jan 12 09:56:19 2024 -0700

    Squashed commit of the following:

    commit 603f2eff9d1aa21ae95f3e134da303b8018d3a33
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 09:48:20 2024 -0700

        fix guardrials partial

    commit 22fc03183a8070c30dfb74d5c23575ac19019855
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 08:54:01 2024 -0700

        Add guardrail partial

    commit e55f6b21972b578261bc5a864ce1d653d98f9e34
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Jan 8 07:50:59 2024 -0700

        Functional guardrails, needs reports update

    commit 6e968e9699ed59a47d88191d03768e042d8b60a4
    Merge: 32b49685 e948ce10
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Dec 18 13:34:36 2023 -0700

        Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history

    commit 32b49685da320501dad2b0ebbb57887b66220ba8
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit 4e309cf6f732565d635de3d4c5d074ada3027e2d
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:55 2023 -0700

        Refactor to use CRISPRessoReports module

    commit e648dc087c0055bc5d2fca13c64071a371dea941
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:11 2023 -0700

        Add CRISPRessoReports subtree

    commit e948ce107ebb0d1d99010ed12e937f34b5e607d4
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit d33c748871a625facfe8d792e29c77ab9779138f
    Author: Kendell Clement <[email protected]>
    Date:   Tue Nov 7 16:31:06 2023 -0700

        Include parameter --assign_ambiguous_alignments_to_first_reference in readme

    commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc
    Author: Kendell Clement <[email protected]>
    Date:   Wed Oct 11 17:17:30 2023 -0600

        Enable quantification by sgRNA (#348)

        This PR includes:
        - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs']
        - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately.

        I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command:

        ```

        CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table  -n hdr3 -p max -gn guide1,guide2
        ```

        ```
        python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3
        ```

        This produces:
        ```
        Processed 25000 alleles
        Reference: Reference (2391/23415 modified reads)
                UNMODIFIED: 21024
                MODIFIED guide1: 2359
                MODIFIED guide2: 32
        Reference: HDR (856/1577 modified reads)
                UNMODIFIED: 721
                MODIFIED guide1: 854
                MODIFIED guide1 + guide2: 1
                MODIFIED guide2: 1
         ```

    commit 2e3da02fdbed2fa8ae02a277763d65a502459827
    Author: Cole Lyman <[email protected]>
    Date:   Tue Oct 10 15:29:08 2023 -0600

        changed tuple to list for matplotlib change (#31) (#346)

        Co-authored-by: mbowcut2 <[email protected]>

    commit cd3c332135fe4db0f9218e3d87263d5c65838ed9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:54:46 2023 -0600

        rename script to camel case

    commit 7c719d65fb36ac7654db9040f226564ea28fcab9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:53:44 2023 -0600

        Add new script for counting high quality bases

    commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 14 15:15:30 2023 -0600

        Prime editing alignment params (#336)

        Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty.

        CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences.

        The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs.

    commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917
    Author: Cole Lyman <[email protected]>
    Date:   Thu Sep 7 16:43:30 2023 -0600

        Fix samtools piping (#325)

        * Remove samtools pipe stderr to stdout

        Sometimes some of the libraries that samtools depends on don't have the correct
        version information, and as such samtools will report this to stderr when run.
        Because we pipe the output of samtools, we expect it to be valid SAM format, but
        when these library version messages are reported, it breaks CRISPRessoWGS.

        * Remove extra spacing at end of lines and add missing comma in WGS

        * Log stderr from samtools in CRISPRessoWGS

    commit 8feff4101f27406d9d88ace97d31a518276bff3f
    Author: Cole Lyman <[email protected]>
    Date:   Fri Sep 1 09:43:56 2023 -0600

        Replace link to CRISPResso schematic with raw URL in README (#329)

        * Replace link to CRISPResso schematic with raw URL

        * Add new lines to the beginning of unordered lists

    commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:52:12 2023 -0600

        Try to unbreak CircleCI

    commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:27 2023 -0600

        Center command line text messages

    commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:07 2023 -0600

        Fix bug in prime-editing scaffold-incorporation plotting

        If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read.

    commit 2b36a1a5c35e8a93516ce8baf464595615e0f402
    Author: Kendell Clement <[email protected]>
    Date:   Wed Aug 9 15:29:48 2023 -0600

        CRISPRessoPooled --compile_postrun_references bug fixes

    commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db
    Author: Kendell Clement <[email protected]>
    Date:   Tue Aug 8 23:30:15 2023 -0600

        Fix missing ' in Pooled --demultiplex_only_at_amplicons

    commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664
    Author: Cole Lyman <[email protected]>
    Date:   Mon Jul 24 10:47:46 2023 -0600

        Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316)

        * Make sorting stable

        * Including c files

        * Sort by #Reads instead of %Reads to avoid floating point errors

        ---------

        Co-authored-by: Samuel Nichols <[email protected]>

    commit de05533b3511a84f3b6b14fc2ef64db041613261
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jul 6 13:54:45 2023 -0600

        Fix multiprocessing lambda pickling (#311)

        * Fix running plots in parallel

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        * Fix multiprocessing lambda pickling (#20)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Further fixes to pickling multiprocessing error (#21)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Use Counter instead of defaultdict in CRISPRessoCORE

        * Update process_futures to dict in Batch and Aggregate

    commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jul 3 17:12:09 2023 -0600

        Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append

    commit 7285da0e987b77b72c8885bb35940e0f50c146bd
    Author: Kendell Clement <[email protected]>
    Date:   Fri Jun 23 16:50:33 2023 -0600

        Fix print bug for invalid fastq

    commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c
    Author: kclem <[email protected]>
    Date:   Wed Jun 21 16:03:48 2023 -0600

        Slugify before creating filename - replaces invalid characters in batch names with _

    commit f97e29c67de4c80b8d6b9cf334f363be4b514ade
    Author: Cole Lyman <[email protected]>
    Date:   Wed Jun 21 14:43:43 2023 -0600

        Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307)

        * Add verbosity argument to CRISPRessoAggregate (#18)

        * Allow for amplicon and guide seqs to be some variant of NA in batch (#19)

        This was discovered when attempting to infer amplicon sequences in batch mode on
        the web interface, NAs were supplied for the amplicon sequences to the sub
        CRISPResso commands.

    commit 32e1e9797da5c3033cdc588e92f06b8813961953
    Author: Mark Clement <[email protected]>
    Date:   Wed Jun 21 14:01:00 2023 -0600

        Allow for interrogation of overlapping sgRNA sites

    commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 12 12:16:47 2023 -0600

        Check input fastq file format

        Asserts input format of fastq files - including if gzipped files are missing the gz suffix.

    commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:41:55 2023 -0600

        Fix CRISPRessoArgParser

    commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:29:31 2023 -0600

        Cosmetic updates for command-line use

        - version bump to 2.2.13
        - If no args are provided, the command line version will print out an abbreviated help message
        - parameters can be excluded from CRISPRessoArgParser

    commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:47 2023 -0600

        Fix multiprocessing error, don't start pool when only using single thread (#302)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        * Only start process pools when using multiple processes

        This is mainly to solve the issue when running on AWS Lambda, but this should
        improve single core performance overall.

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 92a705c939b370373a70cf6ae9f1616de33288b9
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:06 2023 -0600

        Update `base_editor` parameters in README and add Plot Harness (#301)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 7d46c4490235df45c5546b1b470e4e6a99727031
    Author: Cole Lyman <[email protected]>
    Date:   Wed May 10 15:41:33 2023 -0600

        Clarify CRISPRessoWGS intended use (#303)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add sample plotting jupyter notebook

        * Add clarifying info to CRISPRessoWGS description

        Clarify WGS usage

    commit 833a701787bb47674b3e921c38cac6189c775cf7
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 17:02:46 2023 -0400

        Remove debug print statements

    commit 712eb2a11825e8d36f2870deb12b35486bd633fb
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 16:40:07 2023 -0400

        Allow dashes in filenames resolve #73

    commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:41:58 2023 -0400

        Raise exceptions from within futures in plot_pool

    commit 7e807a60de2a9d18bccd034b87106ceaf7153338
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:38:56 2023 -0400

        Fix future pandas indexing warning

        Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead"

    commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9
    Author: Cole Lyman <[email protected]>
    Date:   Thu Apr 20 13:59:27 2023 -0600

        Remove debug print statements fixes #295 (#297)

        The format string option used here is only available in Python version >=3.8.

    commit 478c06f784603e96d20f96e91993fdcc4ac35c8a
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 13 12:09:26 2023 -0400

        Update plotCustomAllelePlot.py script for #292 (#293)

        Update type of 'max_rows' param to int
        Fix location of 'args' in crispresso2_info object

    commit bcdae39e05d530f4a4e78738c3b30f7664981919
    Author: Kendell Clement <[email protected]>
    Date:   Mon Mar 27 13:18:34 2023 -0400

        Update pooled parameter format

    commit 546446e36e7e68b527767d6c31ec341a49df2059
    Author: Kendell Clement <[email protected]>
    Date:   Tue Feb 14 16:26:23 2023 -0500

        Fix running plots in parallel (#286)

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        Co-authored-by: Cole Lyman <[email protected]>

    commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1
    Author: kclem <[email protected]>
    Date:   Fri Feb 10 15:45:15 2023 -0500

        Fix #283 to avoid filename collisions

        Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29

    commit e577318006cd17b2725bd028e5e56634c6eb829a
    Author: kclem <[email protected]>
    Date:   Mon Feb 6 16:37:25 2023 -0500

        Case-insensitive headers accepted in CRISPRessoPooled

    commit d34927620a4a6126a9988b3041e76f60728abbfe
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:48:33 2023 -0500

        Fix print statement in CORE

    commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:22:51 2023 -0500

        Version bump to 2.2.12

    commit 1d4679c72d0c8b4154317c9aff5179217198e2d7
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:01:31 2023 -0500

        Status Updates + Pooled Mixed Mode Update (#279)

        * Implement logging handler to overwrite the latest log status to file

        * Add StatusHandler to CRISPRessoCORE log

        This will take the latest log output and write it to a file (`status.txt`), the
        catch being that with each log the file is overwritten so that one can easily
        tell where CRISPResso currently is and what the error is (if any). These changes
        include some slight refactoring in order to accomodate any potential parameter
        exceptions.

        * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn`

        * Add StatusHandler to CRISPRessoPooled and a little refactoring

        * Implement `percent_complete` to the status log

        * Add StatusHandler to CRISPRessoAggregate log

        * Add StatusHandler to CRISPRessoCompare log

        * Add StatusHandler to CRISPRessoPooledWGSCompare log

        * Add StatusHandler to CRISPRessoWGS log

        * Rename `status.txt` to `CRISPResso_status.txt`

        * Modify status log names to match the tool they are generated from

        * Add percent_complete stages to CRISPRessoCORE

        These also include log statements of each plot that is being generated as well
        as fixing some variable name collisions with `ind`.

        * Format the percentage in the log to be 2 decimal places

        * Change all plotting logs from `info` to `debug` and simplify progress

        This refactors how the progress of the plots is calculated, making it much
        simplier. Before this change we would of had to keep track of the number of
        times `percent_complete` was output, but now it simply updates the percent
        complete after each amplicon is finished processing. Hopefully this will make
        things easier to mantain even though it will be a little less "accurate" (not
        sure how accurate the original implementation was...).

        * Implemented shared console log handler across all CRISPResso* calls

        This allows for easy changes to logging formatting, which was inspired by having
        to change the default logging level. The default logging level needs to be set
        at `logging.DEBUG` in order for the debug log statements to not be ignored for
        the running and status logs.

        * Add ability to set the verbosity level to each CRISPResso* tool

        This allows users to set a verbosity level between 1 and 4 using the
        `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the
        level will default to 4, being the most verbose.

        * Implement showing the last seen `percent_compelte` when none is provided

        * Keep track of and log when multiple parallel runs are completed

        These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that
        we can now display when a run is completed. This potentially breaks how
        signals and interupts are handled with multiple runs happening, but this needs
        to be reviewed.

        * Add debug and percentage complete to CRISPRessoBatch

        * Add percent complete to CRISPRessoPooled

        * Add debug and percent_complete message to CRISPRessoAggregate

        * Add `percent_complete` to CRISPRessoCompare

        * Add `percent_complete` to CRISPRessoPooledWGSCompare

        * Add status and `percent_complete` to CRISPRessoMeta

        * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        * Fix bug to flow CRISPRessoPooled options to sub command

        * Make amplicon file args variable name clear

        * Update how parameters are set and retrieved from parameter object

        The refactor in the previous commit changed the type of the arguments to a
        dictionary which doesn't have the parameters as attributes, and this commit
        fixes that error.

        * Add note in output header for change in default CRISPRessoPooled

        In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the
        default when running in mixed-mode. This is to allow for inexact alignments of
        the reads and the amplicons to the genome. For more context, see this issue
        https://github.com/pinellolab/CRISPResso2/issues/276

        * Clarify the verbosity parameter help message

        * Separate out parameters to `normalize_name` in CRISPRessoCORE

        * Separate out parameters to `normalize_name` in CRISPRessoWGS

        * Separate out parameters to `normalize_name` in CRISPRessoPooled

        * Separate out parameters to `normalize_name` in CRISPRessoCompare

        * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name`

        * Refactor `run_crispresso_cmds` to not require a `logger`

        This commit implements the functionality to make the `logger` object optional by
        seeing which module called the `run_crispresso_cmds` function and obtaining the
        correct object from that module name.

        The function also immediately returns when no commands are passed to it.

        * Add amplicon name to plotting debug statements in CRISPRessoCORE

        ---------

        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Samuel Nichols <[email protected]>

    commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jan 26 15:27:27 2023 -0500

        CRISPRessoPooled custom header fix (#278)

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 104866e1080c973bb025d1a5ba59b19dca1658af
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 14:00:26 2023 -0700

        Fix deprecated numpy type names (fixes #269) (#270)

        In the most recent version of numpy (1.24) some of the types have been
        deprecated. This commit fixes these errors.

    commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 06:49:35 2023 -0700

        Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274)

        I have suffered enough trying to debug my installation, so hopefully this helps
        someone else.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b9851e98104602eb78c2b384105267624295e9d3
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 22 13:30:23 2022 -0700

        Fix bug when pooled bam is input (#265)

        This change checks to see if a bam file was input, and if so it doesn't try to
        remove any intermediate files because there aren't any.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b822612642043e75a19042941f69b457ce51f517
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 15:26:45 2022 -0500

        Delete vscode settings

    commit b99aa624dec68ef7d19264340ce0cafa829625f4
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:29:14 2022 -0500

        Clarify input param help for pooled bam

    commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:28:54 2022 -0500

        Fix #235 - Cigar string is * if read unaligned

        Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235.

    commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 8 13:48:17 2022 -0700

        Add deprecation notice (#260)

        * Add FLASh and Trimmomatic deprecation notice to CLI output

        * Add Edilytics email address to CLI output

    commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a
    Author: Kendell Clement <[email protected]>
    Date:   Tue Dec 6 12:16:19 2022 -0500

        Format filterReadsOnSequencePresence script

    commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e
    Author: Kendell Clement <[email protected]>
    Date:   Fri Dec 2 22:12:54 2022 -0500

        Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string

    commit 9ddea40f7f02b546941ddaa4c71fc5283075051a
    Author: kclem <[email protected]>
    Date:   Mon Nov 14 10:33:04 2022 -0500

        Add check for prime editing extension sequence in prime edited sequence

        if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence.

    commit 152f2dd5001da7090641ee8a1326bde9f7e8104e
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:53:41 2022 -0500

        Version bump to 2.2.11a

    commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:47:30 2022 -0500

        Add param to override prime editing sequence checks

        CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.'

    commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 10:06:51 2022 -0500

        Update filterReadsOnSequencePresence.py

    commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb
    Author: Kendell Clement <[email protected]>
    Date:   Mon Nov 7 22:25:16 2022 -0500

        Add script to filter input based on sequence presence

    commit 713e57a19c35180035ca35e11a5820065eda0198
    Author: Kendell Clement <[email protected]>
    Date:   Tue Oct 18 16:02:26 2022 -0400

        Allow spaces in read names for CRISPRessoWGS

    commit 39ce008bdddccdd8229c0ba185dce78bc2f66968
    Author: Cole Lyman <[email protected]>
    Date:   Sat Oct 8 21:09:58 2022 -0600

        Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250)

    commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Oct 8 23:08:47 2022 -0400

        Batch amplicon plots (#251)

        * Error out if HDR amplicon matches existing amplicon

        * Add check for amplicon sequence uniqueness

        * Fix bug with bam_input not having bam_output

        * Test for no returned lines in auto mode, version bump to 2.2.11

        * Fix pandas deprecation of df.append

    commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8
    Author: Kendell Clement <[email protected]>
    Date:   Thu Oct 6 16:32:02 2022 -0400

        Fix CRISPRessoBatch plot pool bug when plots are suppressed

    commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 21 21:04:51 2022 -0600

        Fix batch quilt plot name (#249)

        This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch.

    commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 15 15:49:08 2022 -0400

        Version bump to 2.2.10

    commit c5f79aebfc1ae209f4ee320df250eed89a02787c
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 14 14:24:55 2022 -0600

        Parallel plot refactor (#247)

        * Fix duplicate plotting in CRISPRessoBatch aggregate

        * Refactor mulltiprocessing plots in CRISPRessoBatch

        * Refactor multiprocessing plots in CRISPRessoCORE

        * Refactor multiprocessing plots for CRISPRessoAggregate

    commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 13 14:12:11 2022 -0400

        print files in curr dir if Aggregate can't find files

    commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0
    Author: Kendell Clement <[email protected]>
    Date:   Mon Sep 12 10:32:57 2022 -0400

        Spelling typo

    commit c15f01c75083403f17c58c121b2afe97e9f2a1ec
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 6 17:49:52 2022 -0400

        Add helper function to create alignment scoring matrix

        New scoring matrix can be created using CRISPResso2Align.make_matrix()

    commit c80f82838c5a228b79ad4484092877cfee08e02c
    Author: Cole Lyman <[email protected]>
    Date:   Mon Aug 22 18:28:33 2022 -0600

        Add `zip_output` (#240)

        * Making zip of results

        * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping

        * Adding --zip to compare and pooled/wgs compare

        * Add more formatting changes to CRISPRessoShared

        * Refactoring propagate_crispress_options so only one version exists

        * Zip added to arguments_to_ignore and warning added when changing arguments

        * Restore styling

        * Update README to include --zip

        * Rename --zip to --zip_output

        * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE

        * Bug fix arg to args

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 5de3d7286d8e33c7cf4d3615fce715806e72f511
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:42:34 2022 -0400

        Fix fix to aggregate for CRISPRessoWGS

    commit a2294c266f43b14969a5d6474076f31a77a57173
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:40:50 2022 -0400

        Fix bug in aggregate for WGS

    commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3
    Author: Kendell Clement <[email protected]>
    Date:   Mon Aug 8 21:53:45 2022 -0400

        Update CRISPRessoWGS to allow non-word characters in region names

    commit 040ac0033d6e250f4e3a412101874cf5e914e08a
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 16:04:59 2022 -0400

        Enable processing of cram files by CRISPRessoWGS

        Adds --reference to samtools view when viewing cram files

    commit cf112a0caba8789e28530cc09171285ec6ea9b4c
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 14:55:46 2022 -0400

        Auto amplicon detection for interleaved input

        Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed.

    commit 4ba524dc7b947feca8a0f743837844f9febc2171
    Author: Cole Lyman <[email protected]>
    Date:   Thu Aug 4 11:32:11 2022 -0600

        Potential fix for aggregate plots in Batch mode (#237)

    commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 21 22:45:48 2022 -0400

        Fix pct_vectors in crispresso2_info json object

    commit 65a079d86d6f386793397398f839c46014b54543
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:46:37 2022 -0400

        Fix more readme spelling bugs

    commit e817376ecd54cdea1f29e303ca25b9e7d1d38333
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:42:23 2022 -0400

        Fix bug in readme spelling

    commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 16:10:09 2022 -0400

        Fix loading of crispresso info from WGS and Pooled

    commit b68a43271115251b18e8955e285ccc18f549e8cd
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:11:04 2022 -0400

        Add plotly to dockerfile

    commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:10:00 2022 -0400

        Fix #231 Allow N's in bam output (Try 2)

    commit c460b3e73fd06a230dbac2e37c86b833144ebf94
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:09:10 2022 -0400

        Revert "Fix #231 Allow N's in bam output"

        This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3.

    commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 13:52:37 2022 -0400

        Fix #231 Allow N's in bam output

    commit 0a2419e518dc9b3520058c3927f98b31cd51347e
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:10:01 2022 -0600

        Fix bug when name is provided instead of amplicon_name in pooled input file (#229)

        Also, raise an exception (instead of incorrectly executing) when there are not
        enough matched parameters in the pooled input file.

    commit cb58212379803788c04ca5793baaa760cbbeaa81
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:09:49 2022 -0600

        Fix bug when comparing two samples with the same name. (#228)

    commit e8a796f5f451409cbafed4404dfba4b6b8a124ca
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jun 23 21:30:23 2022 -0400

        Version bump to 2.2.9

    commit 632143ddedea48bab9229baeb4bf3ea4d1f658d6
    Author: Cole Lyman <[email protected]>
    Date:   Mon Jun 20 19:53:14 2022 -0600

        Don't run global frameshift plot when there are no reads (#226)

        When there are no reads (i.e. global_MODIFIED_FRAMESHIFT +
        global_MODIFIED_NON_FRAMESHIFT + global_NON_MODIFIED_NON_FRAMESHIFT == 0) there
        was a bug when trying to compute the pie chart, because all of the values in the
        pie chart are 0. This fix, will make sure that there is at least one read in
        order for the plot to bee constructed properly.

    commit 4bb06218e835d2624d53fd401542caef6f8a3a55
    Author: kclem <[email protected]>
    Date:   Fri Jun 3 16:57:02 2022 -0400

        Improvements for guide inference in 'auto' mode

        In 'auto' mode, a putative guide sequence is selected at the site of maximal editing.  If the site of maximal editing happens near the end of the guide (e.g. base 0) many things will break (e.g. quantification windows, etc). This update excludes bases from being used to find the guide using the --exclude_bp_from_left and --exclude_bp_from_right parameters. At default, these parameters are 15bp, so the first and last 15bp would not be selected for the site of maximal editing and thus be the site of a guide sequence. In addition, the site of maximal editing must have 3x the magnitude over the background.

    commit 9d64de187835b2553ad2b4374d32edab27f83645
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jun 2 20:22:25 2022 -0400

        Update README.md

    commit 6aafc5387986f5089ba55b68d128343d68052792
    Author: Simon P Shen <[email protected]>
    Date:   Tue May 31 17:42:53 2022 -0400

        directory in quotes in batch cmd (#222)

        Add quotes around output folder for folders that have spaces.

    commit 432f163ac68b9a650d1fd326171aadc505ee87f4
    Author: Kendell Clement <[email protected]>
    Date:   Tue May 24 23:38:36 2022 -0400

        CRISPRessoBatch fills NA values in batch settings

        NA values in CRISPRessoBatch are filled with the value from args - either the default value or the value from the command line args (if set)

    commit 6de774adbad3aa8cd99d07b0ba7692984b356cd4
    Author: kclem <[email protected]>
    Date:   Mon May 23 14:18:02 2022 -0400

        Fix file naming bug for HDR outputs

        In html file, figures 4e and 4f incorrectly referenced figure 4d. This fixes this bug.

    commit b88fec0668a4082a12ead3d26582e86d829dd7cc
    Author: Kendell Clement <[email protected]>
    Date:   Sat May 21 00:32:15 2022 -0400

        For bam_output, fix bug that wrote unaligned lines twice

    commit 3564e77ebcdedb4b01cc01dcca18ba3221fac67c
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 19 16:32:18 2022 -0400

        Update README with CRISPRessoPooled headers and bam_output parameters

    commit bc08d81f17cb1929d1c37a1773cffcf36fb12fe2
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 19 16:11:30 2022 -0400

        Add more links to tools

    commit 006c497a379ecd94b017a883a5db887861e1586a
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 19 16:08:14 2022 -0400

        Add links to tools

    commit dc8243373ad00d6bd467fc30c59942596ff0c5d6
    Author: Kendell Clement <[email protected]>
    Date:   Mon May 16 21:38:06 2022 -0400

        fastq_to_bam implementation (#219)

    commit e88b6833977c6b2768299e0b2e7af623e3a9ae7c
    Author: Kendell Clement <[email protected]>
    Date:   Sun May 8 02:14:13 2022 -0400

        Fix bug for when guides don't agree in CRISPRessoAggregate

    commit 7eb763116a8c60603f1cd654645215767ee8eb52
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 5 03:28:21 2022 -0400

        Fix bug for case of empty summary plots in report generation

    commit 0324fa67d14ed945f0c9531d9bcf73ebcf4ca042
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 5 03:28:02 2022 -0400

        Create report for number of significant bases in CRISPRessoCompare

    commit e3c9d0026a9ee6732f3ed6bdcf2a824850d7e66a
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 22:43:11 2022 -0400

        Update pickle to json in readme and CRISPRessoPooledWGSCompare

    commit 1553f7977c12bf1091a20ca55b878bccfb739b61
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 18:10:04 2022 -0400

        Merge pull request #4 from pinellolab/master (#218)

    commit bcecbfc047d294e26f381a6668e08cb4db24445c
    Merge: 15b0e05b bb13e007
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 18:06:37 2022 -0400

        Merge branch 'master' into master

    commit bb13e007738d6e7a4909e01f03daff592f334f36
    Merge: af4ab6e8 d0b41483
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 17:59:32 2022 -0400

        Merge branch 'master' of https://github.com/edilytics/CRISPResso2

    commit 15b0e05b9e03bbec5236e58776ddf9aa2f93180e
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 17:54:52 2022 -0400

        2 flexible pooled input (#217)

        * Batch type coerce and r2 file check

        * Upgrade tabs for bootstrap5

        * Update readme with additional pooled amplicon file headers

        Co-authored-by: Samuel Nichols <[email protected]>

    commit d0b41483bee704940ba60c58289f412b04c71659
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 13:43:43 2022 -0400

        Update README.md

    commit ce49fab5301cb73ba0daf6c765e350eb083c76f1
    Merge: 5f909713 b913fcb4
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 13:40:30 2022 -0400

        Merge pull request #3 from edilytics/2-flexible-pooled-input

        Add flexibility to CRISPRessoPooled amplicon input by allowing headers. Also, prime editing and quantification window coordinate parameters can be passed to CRISPRessoPooled.

    commit b913fcb402a8ba3106c3ff7913563a33d8d19fca
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 13:38:25 2022 -0400

        Update CRISPRessoPooledCORE.py

        Replace process to read header, increase flexibility for column order

    commit 945bf31f16530b7ce25b89095b2c7005bf146117
    Merge: 7b8f6788 5f909713
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 12:45:24 2022 -0400

        Merge branch 'master' into 2-flexible-pooled-input

    commit 5f9097133765736a7c2fe3c8e9b730845fed0b70
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 12:23:44 2022 -0400

        Version bump to 2.2.8

    commit c4a94ce0e06c6ebae13e128fbe6b708e635121c4
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 00:13:17 2022 -0400

        Fix summary plot representation for multi reports

        *fixed old reference to make_multi_report which called old summary plot format
        * renamed summary_plot to summary_plots to reflect a dict with multiple plots

    commit 62900e9ae6fa37ce99a04f12a63ed5c912f75042
    Author: Cole Lyman <[email protected]>
    Date:   Tue May 3 20:47:52 2022 -0600

        Large aggregation (#192)

        * Squashed commit of the following:

        commit 8564eb03f0d9e62abf4b7528baf5c2ae296be8f9
        Merge: f6ef62c 07cc7d8
        Author: Kendell Clement <[email protected]>
        Date:   Tue Jan 11 16:20:15 2022 -0500

            Merge branch 'indel-alignment-fix' of https://github.com/edilytics/CRISPResso2 into indel-alignment-fix

        commit 07cc7d856ab3fcbbaa5381f17f29568192388887
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:29:59 2021 -0700

            Fix bug in `find_indels_substitutions`

            This bug occurred when there was a deletion at the end of a sequence, and was
            thus not properly accounted for.

        commit f6ef62cfdf909adac1b10ea86555cd218f8b2a74
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:29:59 2021 -0700

            Fix bug in `find_indels_substitutions`

            This bug occurred when there was a deletion at the end of a sequence, and was
            thus not properly accounted for.

        commit 7212f87f4be60057a6c848947ff6b5efde132a25
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:26:17 2021 -0700

            Add a unit test for `find_indels_substitutions`

            This unit test checks for deletions at the end of a sequence, which are
            inherently outside of the include_indx_set window.

        commit d50b4e903b973c71a275e31d470b40e59280ee13
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:03:22 2021 -0700

            Fix a bug in `find_indels_substitutions`

            The bug that this commit fixes is when an insertion occurs at the edge of the
            include indexes. The trouble with this earlier was that it was using the `idx`
            to calculate the size of the insertion, but the `idx` wasn't being incremented
            anymore because it was outside of the include window.

        commit 4db066f7bc333b7662a9232ac732ebb33ac3ace8
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:01:39 2021 -0700

            Add test case for `find_indels_substitutions`

            This test case is extracted from the CRISPRessoBatch integration test and
            provides an example where there is an insertion at the edge of the include
            index.

        commit 3b3a7417f5bbd6c2785a2af54a47e01d2e820451
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 11:37:07 2021 -0700

            Fix bug in CRISPRessoCompare where sample names were not properly set

            This was a place where it was (partially) missed during the crispresso2_info
            object refactoring.

        commit e9f5eff3d95b676b5ee2e23371a5604f600d34b2
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:26:17 2021 -0700

            Add a unit test for `find_indels_substitutions`

            This unit test checks for deletions at the end of a sequence, which are
            inherently outside of the include_indx_set window.

        commit d4d45a918254ab19a7e7956e9e731389c6f36ecb
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:03:22 2021 -0700

            Fix a bug in `find_indels_substitutions`

            The bug that this commit fixes is when an insertion occurs at the edge of the
            include indexes. The trouble with this earlier was that it was using the `idx`
            to calculate the size of the insertion, but the `idx` wasn't being incremented
            anymore because it was outside of the include window.

        commit 13f00bb40239c83e6e5cf844561fdb7000d3d9ab
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:01:39 2021 -0700

            Add test case for `find_indels_substitutions`

            This test case is extracted from the CRISPRessoBatch integration test and
            provides an example where there is an insertion at the edge of the include
            index.

        commit 659ae34e8fd106f7ecc163b5bea0b5a80ab0283c
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 11:37:07 2021 -0700

            Fix bug in CRISPRessoCompare where sample names were not properly set

            This was a place where it was (partially) missed during the crispresso2_info
            object refactoring.

        * Add parameter `--suppress_batch_summary_plots`

        If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

        * Pep formatting cleanup

        * Add summary nucleotide plots to aggregate

        * Aggregate plots are paginated

        * Update CRISPRessoAggregateCORE.py

        Remove max sample limit for plotting

        * Add --max_samples_per_summary_plot to CRISPRessoAggregate

        Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

        * Add plotly function to plot an interactive heatmap

        * Fix deprecated numpy type to suppress warning

        * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

        These heatmaps are interactive (zoomable and panable) and show for each sample
        the percentage of insertions, substitutions, and deletions.

        * Add the heatmap summaries to the CRISPRessoAggregate report

        * Update Bootstrap to 5.1.3

        This is mainly so that we can use the fullscreen modal functionality in this version.

        * Move the plotly heatmaps to a Bootstrap modal

        * Fix bug where plots were not filling up entire modal.

        I have tried countless different ways for this to work, and this is the best
        that I can come up with. After the modal is opened it triggers the plot to
        resize, and then for some reason you need to trigger the resize event. I think
        this is because a `div` changing size won't actually trigger the resizing of the
        plot (and neither will just calling `Plotly.Plots.resize`...?!).

        * Update the axis labels and add autosize to plotly heatmaps

        I'm pretty sure the autosize doesn't do anything, but it is there for good
        measure.

        * Abandon attempts to make plots fullscreen

        This includes removing the Bootstrap modal (two out of the three plots would
        resize properly and I couldn't figure out a way to have the plot displayed
        outside of the modal). I have left in some javascript to make the plot
        fullscreen, but I couldn't get the formatting quite right and the plot wasn't
        much bigger in the fullscreen version because there was a ton of space between
        the plot and the heatmap. If some brave soul would like to tackle it, feel free!

        * Rename and refactor how plot data is passed around

        I have consolidated how the plot data is passed around, so that now you can pass
        in only one dict with all of the information instead of 4 or 5 separate
        parameters. I also renamed the `heatmap_plot_*` to
        `allele_modification_heatmap_*`.

        * Implement the line plot version of the modification percentages

        This also includes correctly resizing the plot when the line plot tab is
        selected!

        * Change default `max_samples_per_summary_plot` to be 150 instead of 250

        * Remove extra assignments of `this_number_samples` and suppress plot

        The plot that is suppressed is the large nucleotide quilt when there is a large
        number of samples. Is it okay to suppress this plot @kclem?

        * Implement parallel plotting in CRISPRessoAggregate

        * Fix sample indexing error and heatmap scaling for large number of samples

        * Add parameter `--suppress_batch_summary_plots`

        If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

        * Pep formatting cleanup

        * Add summary nucleotide plots to aggregate

        * Aggregate plots are paginated

        * Update CRISPRessoAggregateCORE.py

        Remove max sample limit for plotting

        * Add --max_samples_per_summary_plot to CRISPRessoAggregate

        Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

        * Add plotly function to plot an interactive heatmap

        * Fix deprecated numpy type to suppress warning

        * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

        These heatmaps are interactive (zoomable and panable) and show for each sample
        the percentage of insertions, substitutions, and deletions.

        * Add the heatmap summaries to the CRISPRessoAggregate report

        * Update Bootstrap to 5.1.3

        This is mainly so that we can use the fullscreen modal functionality in this version.

        * Move the plotly heatmaps to a Bootstrap modal

        * Fix bug where plots were not filling up entire modal.

        I have tried countless different ways for this to work, and this is the best
        that I can come up with. After the modal is opened it triggers the plot to
        resize, and then for some reason you need to trigger the resize event. I think
        this is because a `div` changing size won't actually trigger the resizing of the
        plot (and neither will just calling `Plotly.Plots.resize`...?!).

        * Update the axis labels and add autosize to plotly heatmaps

        I'm pretty sure the autosize doesn't do anything, but it is there for good
        measure.

        * Abandon attempts to make plots fullscreen

        This includes removing the Bootstrap modal (two out of the three plots would
        resize properly and I couldn't figure out a way to have the plot displayed
        outside of the modal). I have left in some javascript to make the plot
        fullscreen, but I couldn't get the formatting quite right and the plot wasn't
        much bigger in the fullscreen version because there was a ton of space between
        the plot and the heatmap. If some brave soul would like to tackle it, feel free!

        * Rename and refactor how plot data is passed around

        I have consolidated how the plot data is passed around, so that now you can pass
        in only one dict with all of the information instead of 4 or 5 separate
        parameters. I also renamed the `heatmap_plot_*` to
        `allele_modification_heatmap_*`.

        * Implement the line plot version of the modification percentages

        This also includes correctly resizing the plot when the line plot tab is
        selected!

        * Change default `max_samples_per_summary_plot` to be 150 instead of 250

        * Remove extra assignments of `this_number_samples` and suppress plot

        The plot that is suppressed is the large nucleotide quilt when there is a large
        number of samples. Is it okay to suppress this plot @kclem?

        * Implement parallel plotting in CRISPRessoAggregate

        * Fix sample indexing error and heatmap scaling for large number of samples

        * Add plotly requrement to setup.py

        * Remove space around vertical barcharts

        * Add scrollbar to long images in multiReport

        * Fill in default (empty) values to allele modification plots

        When not running CRISPRessoAggregate, default values for the
        `allele_modification_heatmap_plot` and `allele_modification_lin_plot`
        dictionaries will be set so that the template can be properly rendered.

        * Include CRISPRessoBatch in the refactor of how summary_plot dicts are handled

        * Update dockerfile for new docker

        * minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

        * Allow for flexible parsing of quant window coordinates

        * CRISPRessoPooled debug flash command, fix pep formatting

        * Set flexiguide homology parameter type to int

        * Coerce ints in batch file checking (#200)

        * Batch type coerce and r2 file check

        * Revert "Batch type coerce and r2 file check"

        This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

        * Coerce int values

        * Handle multiple qwcs in batch mode

        If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

        * Fix bug from old pandas for int cols

        Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

        * Create allele modification heatmaps and line plots in CRISPRessoBatch

        * Add allele modification heatmaps and line plots to CRISPRessoBatch

        * Make all plots in CRISPRessoBatch run in parallel

        * Make `--suppress_batch_summary_plots` store true

        Also, only open and shutdown the process pool when necessary.

        * Add blank values for allele_modification entries when not present

        Co-authored-by: Kendell Clement <[email protected]>
        Co-authored-by: dharjanto <[email protected]>
        Co-authored-by: Samuel Nichols <[email protected]>

    commit f67376fc9ab0e407d4086aa42fd1c77706ebc9c0
    Author: Kendell Clement <[email protected]>
    Date:   Fri Apr 15 00:46:30 2022 -0400

        Fix bug from old pandas for int cols

        Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

    commit b34fe2956ff88629809b2434878028723dfc4895
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 14 23:58:07 2022 -0400

        Handle multiple qwcs in batch mode

        If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

    commit c94e3b9f2e301bda91e9c1e6f4ef794b33b5dbf0
    Author: Samuel Nichols <[email protected]>
    Date:   Thu Apr 14 21:48:32 2022 -0600

        Coerce ints in batch file checking (#200)

        * Batch type coerce and r2 file check

        * Revert "Batch type coerce and r2 file check"

        This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

        * Coerce int values

    commit fc4542491bb86eb143db0044a848a56234403496
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 14 22:13:23 2022 -0400

        Set flexiguide homology parameter type to int

    commit 23fe2aa8e26067d1bcf36bfafc67e023c7588d2f
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 14 22:12:37 2022 -0400

        CRISPRessoPooled debug flash command, fix pep formatting

    commit d292d33d8c1fa3bfd2cee656643fd47bcdab161d
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 14 22:00:19 2022 -0400

        Allow for flexible parsing of quant window coordinates

    commit e1667cb53a7ea6fbb33369c8530a78639ed423ec
    Author: dharjanto <[email protected]>
    Date:   Mon Apr 11 22:08:21 2022 -0400

        minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

    commit 7b8f6788da18f6ab173fa3c3d10f4ab6bb2acc26
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Apr 8 10:21:00 2022 -0600

        Update README

    commit 9bc24cd0474ed9f398dff64274d3181c4b2f8637
    Author: Samuel Nichols <[email protected]>
    Date:   Tue Mar 29 11:25:09 2022 -0600

        Using Amplicon_Name

    commit 88ac5d72074b3da63de035e02c911ce34cd29414
    Merge: b6057a2d e5afa478
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Mar 28 22:32:09 2022 -0600

        Merge remote-tracking branch 'origin/master' into 2-flexible-pooled-input

    commit b6057a2d54cb8637ff0900416de8e2de72213f76
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Mar 28 20:53:05 2022 -0600

        Printing info statements for matched headers

    commit af4ab6e8507d7aa4b7b68f217a458e0d9c966f55
    Merge: bbb7d6f0 51a943c3
    Author: Cole Lyman <[email protected]>
    Date:   Fri Mar 25 09:44:13 2022 -0600

        Merge branch 'pinellolab:master' into master

    commit 3c1eb012fc02563e3e963f17a62c7e932f5bcddc
    Author: Samuel Nichols <[email protected]>
    Date:   Thu Mar 24 12:31:43 2022 -0600

        Debugging and column checking

    commit 0b47acbc592a6df6adf14641357b2104b76be691
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Mar 23 09:42:51 2022 -0600

        New variables added to pooled

    commit a0ff3a44d6d19d7b37f91919b5c0180206f72d53
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Mar 21 09:32:28 2022 -0600

        Read as string not bytes

    commit 710675fc3c0307e21103abd604315b47ff80a894
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Mar 16 13:51:30 2022 -0600

        Adding command building for new options

    commit f386818a48e5c840bd567611e6f1320c8146cac7
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Mar 16 10:08:33 2022 -0600

        Comment out df_template.iloc instance

    commit eb5e309da57c8b96cd760728ddbf67be05f30d1c
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Mar 16 09:59:19 2022 -0600

        Potential solution for flexible headers

    commit 51a943c3a8f8181963acc420e75a5e8ee103cf7c
    Author: Kendell Clement <[email protected]>
    Date:   Tue Mar 15 11:00:46 2022 -0400

        CRISPRessoPooled pep formatting and fix

        CRISPRessoPooled doesn't re-count reads if it has been run once and the `aligned_pooled_bam` is provided as input
        pep code formatting changes

    commit bbb7d6f0907aa13518d20e7f470e7de518b825f4
    Merge: ddbd39f0 5a10d638
    Author: Kendell Clement <[email protected]>
    Date:   Tue Mar 15 10:23:38 2022 -0400

        Merge branch 'master' of https://github.com/edilytics/CRISPResso2

    commit 5a10d638c638f21f8a2934955e92ef7e117b889e
    Author: Kendell Clement <[email protected]>
    Date:   Sat Feb 26 14:21:57 2022 -0500

        Move metadata for bam input and output

    commit e5afa4784d5330a1dc95c5deafcd9217edeac631
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Feb 16 10:20:24 2022 -0700

        Coerce int values

    commit ede7d85b50055311908000578c76a1860ae9de4d
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Feb 16 10:18:29 2022 -0700

        Revert "Batch type coerce and r2 file check"

        This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

    commit f91736688ea9739cf3063e3601c52ad6da1116a4
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Feb 16 10:10:52 2022 -0700

        Batch type coerce and r2 file check

    commit 7b4a310b0f8b64c00e02eca3d522ad50d39b43ae
    Author: Kendell Clement <[email protected]>
    Date:   Tue Feb 15 22:18:05 2022 -0500

        Reiterate WGS region file is tab-separated

        Add note to WGS description that region file should be tab-separated. Closes #199

    commit b8497542e388ad401d0815d426f27abc3201a76d
    Author: kclem <[email protected]>
    Date:   Fri Feb 11 15:07:14 2022 -0500

        Extend x-axis to longest scaffold incorporation length

    commit ab7248947afade089809c74bfe6e9d5394e8f6dc
    Author: kclem <[email protected]>
    Date:   Wed Feb 9 17:05:11 2022 -0500

        Fix prime editing indexing for plots

    commit ddbd39f06b262d5ebd2cc69e116c08b22b6bd84e
    Merge: a7ffd468 442a48c7
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jan 13 15:35:36 2022 -0500

        Merge branch 'pinellolab:master' into master

    commit 442a48c7f4c62ec2ebc95fe268475e5e2a4b2f0c
    Author: Cole Lyman <[email protected]>
    Date:   Tue Jan 11 15:28:28 2022 -0700

        Indel alignment fix (#182)

        * Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

        * Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

        * Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

        * Add a unit test for `find_indels_substitutions`

        This unit test checks for deletions at the end of a sequence, which are
        inherently outside of the include_indx_set window.

        * Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

        * Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

        * Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

        * Add a unit test for `find_indels…
Snicker7 added a commit to edilytics/CRISPResso2 that referenced this pull request Apr 4, 2024
* imports C2Pro plots if available

* added --use_matplotlib flag

* added C2Pro
matched api funciton signatures

* added api args for plotly

* added **kwargs

* renamed config to custom_config, more specificity

* added backend flag for plotly kaleido

* added pro_installed boolean for templates, added plotly dependency to report templates

* Squashed commit of the following:

commit c909ea3b34e87ce637e00dac075d2bb2f8bfb954
Author: McKay <[email protected]>
Date:   Thu Feb 15 15:55:23 2024 -0700

    added plotly dependency for pro

commit 76b3601f6a0144f100266153f1c999e0c5de65de
Author: Samuel Nichols <[email protected]>
Date:   Fri Jan 12 09:56:19 2024 -0700

    Squashed commit of the following:

    commit 603f2eff9d1aa21ae95f3e134da303b8018d3a33
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 09:48:20 2024 -0700

        fix guardrials partial

    commit 22fc03183a8070c30dfb74d5c23575ac19019855
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 08:54:01 2024 -0700

        Add guardrail partial

    commit e55f6b21972b578261bc5a864ce1d653d98f9e34
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Jan 8 07:50:59 2024 -0700

        Functional guardrails, needs reports update

    commit 6e968e9699ed59a47d88191d03768e042d8b60a4
    Merge: 32b49685 e948ce10
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Dec 18 13:34:36 2023 -0700

        Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history

    commit 32b49685da320501dad2b0ebbb57887b66220ba8
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit 4e309cf6f732565d635de3d4c5d074ada3027e2d
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:55 2023 -0700

        Refactor to use CRISPRessoReports module

    commit e648dc087c0055bc5d2fca13c64071a371dea941
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:11 2023 -0700

        Add CRISPRessoReports subtree

    commit e948ce107ebb0d1d99010ed12e937f34b5e607d4
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit d33c748871a625facfe8d792e29c77ab9779138f
    Author: Kendell Clement <[email protected]>
    Date:   Tue Nov 7 16:31:06 2023 -0700

        Include parameter --assign_ambiguous_alignments_to_first_reference in readme

    commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc
    Author: Kendell Clement <[email protected]>
    Date:   Wed Oct 11 17:17:30 2023 -0600

        Enable quantification by sgRNA (#348)

        This PR includes:
        - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs']
        - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately.

        I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command:

        ```

        CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table  -n hdr3 -p max -gn guide1,guide2
        ```

        ```
        python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3
        ```

        This produces:
        ```
        Processed 25000 alleles
        Reference: Reference (2391/23415 modified reads)
                UNMODIFIED: 21024
                MODIFIED guide1: 2359
                MODIFIED guide2: 32
        Reference: HDR (856/1577 modified reads)
                UNMODIFIED: 721
                MODIFIED guide1: 854
                MODIFIED guide1 + guide2: 1
                MODIFIED guide2: 1
         ```

    commit 2e3da02fdbed2fa8ae02a277763d65a502459827
    Author: Cole Lyman <[email protected]>
    Date:   Tue Oct 10 15:29:08 2023 -0600

        changed tuple to list for matplotlib change (#31) (#346)

        Co-authored-by: mbowcut2 <[email protected]>

    commit cd3c332135fe4db0f9218e3d87263d5c65838ed9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:54:46 2023 -0600

        rename script to camel case

    commit 7c719d65fb36ac7654db9040f226564ea28fcab9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:53:44 2023 -0600

        Add new script for counting high quality bases

    commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 14 15:15:30 2023 -0600

        Prime editing alignment params (#336)

        Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty.

        CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences.

        The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs.

    commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917
    Author: Cole Lyman <[email protected]>
    Date:   Thu Sep 7 16:43:30 2023 -0600

        Fix samtools piping (#325)

        * Remove samtools pipe stderr to stdout

        Sometimes some of the libraries that samtools depends on don't have the correct
        version information, and as such samtools will report this to stderr when run.
        Because we pipe the output of samtools, we expect it to be valid SAM format, but
        when these library version messages are reported, it breaks CRISPRessoWGS.

        * Remove extra spacing at end of lines and add missing comma in WGS

        * Log stderr from samtools in CRISPRessoWGS

    commit 8feff4101f27406d9d88ace97d31a518276bff3f
    Author: Cole Lyman <[email protected]>
    Date:   Fri Sep 1 09:43:56 2023 -0600

        Replace link to CRISPResso schematic with raw URL in README (#329)

        * Replace link to CRISPResso schematic with raw URL

        * Add new lines to the beginning of unordered lists

    commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:52:12 2023 -0600

        Try to unbreak CircleCI

    commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:27 2023 -0600

        Center command line text messages

    commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:07 2023 -0600

        Fix bug in prime-editing scaffold-incorporation plotting

        If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read.

    commit 2b36a1a5c35e8a93516ce8baf464595615e0f402
    Author: Kendell Clement <[email protected]>
    Date:   Wed Aug 9 15:29:48 2023 -0600

        CRISPRessoPooled --compile_postrun_references bug fixes

    commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db
    Author: Kendell Clement <[email protected]>
    Date:   Tue Aug 8 23:30:15 2023 -0600

        Fix missing ' in Pooled --demultiplex_only_at_amplicons

    commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664
    Author: Cole Lyman <[email protected]>
    Date:   Mon Jul 24 10:47:46 2023 -0600

        Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316)

        * Make sorting stable

        * Including c files

        * Sort by #Reads instead of %Reads to avoid floating point errors

        ---------

        Co-authored-by: Samuel Nichols <[email protected]>

    commit de05533b3511a84f3b6b14fc2ef64db041613261
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jul 6 13:54:45 2023 -0600

        Fix multiprocessing lambda pickling (#311)

        * Fix running plots in parallel

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        * Fix multiprocessing lambda pickling (#20)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Further fixes to pickling multiprocessing error (#21)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Use Counter instead of defaultdict in CRISPRessoCORE

        * Update process_futures to dict in Batch and Aggregate

    commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jul 3 17:12:09 2023 -0600

        Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append

    commit 7285da0e987b77b72c8885bb35940e0f50c146bd
    Author: Kendell Clement <[email protected]>
    Date:   Fri Jun 23 16:50:33 2023 -0600

        Fix print bug for invalid fastq

    commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c
    Author: kclem <[email protected]>
    Date:   Wed Jun 21 16:03:48 2023 -0600

        Slugify before creating filename - replaces invalid characters in batch names with _

    commit f97e29c67de4c80b8d6b9cf334f363be4b514ade
    Author: Cole Lyman <[email protected]>
    Date:   Wed Jun 21 14:43:43 2023 -0600

        Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307)

        * Add verbosity argument to CRISPRessoAggregate (#18)

        * Allow for amplicon and guide seqs to be some variant of NA in batch (#19)

        This was discovered when attempting to infer amplicon sequences in batch mode on
        the web interface, NAs were supplied for the amplicon sequences to the sub
        CRISPResso commands.

    commit 32e1e9797da5c3033cdc588e92f06b8813961953
    Author: Mark Clement <[email protected]>
    Date:   Wed Jun 21 14:01:00 2023 -0600

        Allow for interrogation of overlapping sgRNA sites

    commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 12 12:16:47 2023 -0600

        Check input fastq file format

        Asserts input format of fastq files - including if gzipped files are missing the gz suffix.

    commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:41:55 2023 -0600

        Fix CRISPRessoArgParser

    commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:29:31 2023 -0600

        Cosmetic updates for command-line use

        - version bump to 2.2.13
        - If no args are provided, the command line version will print out an abbreviated help message
        - parameters can be excluded from CRISPRessoArgParser

    commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:47 2023 -0600

        Fix multiprocessing error, don't start pool when only using single thread (#302)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        * Only start process pools when using multiple processes

        This is mainly to solve the issue when running on AWS Lambda, but this should
        improve single core performance overall.

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 92a705c939b370373a70cf6ae9f1616de33288b9
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:06 2023 -0600

        Update `base_editor` parameters in README and add Plot Harness (#301)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 7d46c4490235df45c5546b1b470e4e6a99727031
    Author: Cole Lyman <[email protected]>
    Date:   Wed May 10 15:41:33 2023 -0600

        Clarify CRISPRessoWGS intended use (#303)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add sample plotting jupyter notebook

        * Add clarifying info to CRISPRessoWGS description

        Clarify WGS usage

    commit 833a701787bb47674b3e921c38cac6189c775cf7
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 17:02:46 2023 -0400

        Remove debug print statements

    commit 712eb2a11825e8d36f2870deb12b35486bd633fb
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 16:40:07 2023 -0400

        Allow dashes in filenames resolve #73

    commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:41:58 2023 -0400

        Raise exceptions from within futures in plot_pool

    commit 7e807a60de2a9d18bccd034b87106ceaf7153338
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:38:56 2023 -0400

        Fix future pandas indexing warning

        Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead"

    commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9
    Author: Cole Lyman <[email protected]>
    Date:   Thu Apr 20 13:59:27 2023 -0600

        Remove debug print statements fixes #295 (#297)

        The format string option used here is only available in Python version >=3.8.

    commit 478c06f784603e96d20f96e91993fdcc4ac35c8a
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 13 12:09:26 2023 -0400

        Update plotCustomAllelePlot.py script for #292 (#293)

        Update type of 'max_rows' param to int
        Fix location of 'args' in crispresso2_info object

    commit bcdae39e05d530f4a4e78738c3b30f7664981919
    Author: Kendell Clement <[email protected]>
    Date:   Mon Mar 27 13:18:34 2023 -0400

        Update pooled parameter format

    commit 546446e36e7e68b527767d6c31ec341a49df2059
    Author: Kendell Clement <[email protected]>
    Date:   Tue Feb 14 16:26:23 2023 -0500

        Fix running plots in parallel (#286)

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        Co-authored-by: Cole Lyman <[email protected]>

    commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1
    Author: kclem <[email protected]>
    Date:   Fri Feb 10 15:45:15 2023 -0500

        Fix #283 to avoid filename collisions

        Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29

    commit e577318006cd17b2725bd028e5e56634c6eb829a
    Author: kclem <[email protected]>
    Date:   Mon Feb 6 16:37:25 2023 -0500

        Case-insensitive headers accepted in CRISPRessoPooled

    commit d34927620a4a6126a9988b3041e76f60728abbfe
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:48:33 2023 -0500

        Fix print statement in CORE

    commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:22:51 2023 -0500

        Version bump to 2.2.12

    commit 1d4679c72d0c8b4154317c9aff5179217198e2d7
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:01:31 2023 -0500

        Status Updates + Pooled Mixed Mode Update (#279)

        * Implement logging handler to overwrite the latest log status to file

        * Add StatusHandler to CRISPRessoCORE log

        This will take the latest log output and write it to a file (`status.txt`), the
        catch being that with each log the file is overwritten so that one can easily
        tell where CRISPResso currently is and what the error is (if any). These changes
        include some slight refactoring in order to accomodate any potential parameter
        exceptions.

        * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn`

        * Add StatusHandler to CRISPRessoPooled and a little refactoring

        * Implement `percent_complete` to the status log

        * Add StatusHandler to CRISPRessoAggregate log

        * Add StatusHandler to CRISPRessoCompare log

        * Add StatusHandler to CRISPRessoPooledWGSCompare log

        * Add StatusHandler to CRISPRessoWGS log

        * Rename `status.txt` to `CRISPResso_status.txt`

        * Modify status log names to match the tool they are generated from

        * Add percent_complete stages to CRISPRessoCORE

        These also include log statements of each plot that is being generated as well
        as fixing some variable name collisions with `ind`.

        * Format the percentage in the log to be 2 decimal places

        * Change all plotting logs from `info` to `debug` and simplify progress

        This refactors how the progress of the plots is calculated, making it much
        simplier. Before this change we would of had to keep track of the number of
        times `percent_complete` was output, but now it simply updates the percent
        complete after each amplicon is finished processing. Hopefully this will make
        things easier to mantain even though it will be a little less "accurate" (not
        sure how accurate the original implementation was...).

        * Implemented shared console log handler across all CRISPResso* calls

        This allows for easy changes to logging formatting, which was inspired by having
        to change the default logging level. The default logging level needs to be set
        at `logging.DEBUG` in order for the debug log statements to not be ignored for
        the running and status logs.

        * Add ability to set the verbosity level to each CRISPResso* tool

        This allows users to set a verbosity level between 1 and 4 using the
        `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the
        level will default to 4, being the most verbose.

        * Implement showing the last seen `percent_compelte` when none is provided

        * Keep track of and log when multiple parallel runs are completed

        These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that
        we can now display when a run is completed. This potentially breaks how
        signals and interupts are handled with multiple runs happening, but this needs
        to be reviewed.

        * Add debug and percentage complete to CRISPRessoBatch

        * Add percent complete to CRISPRessoPooled

        * Add debug and percent_complete message to CRISPRessoAggregate

        * Add `percent_complete` to CRISPRessoCompare

        * Add `percent_complete` to CRISPRessoPooledWGSCompare

        * Add status and `percent_complete` to CRISPRessoMeta

        * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        * Fix bug to flow CRISPRessoPooled options to sub command

        * Make amplicon file args variable name clear

        * Update how parameters are set and retrieved from parameter object

        The refactor in the previous commit changed the type of the arguments to a
        dictionary which doesn't have the parameters as attributes, and this commit
        fixes that error.

        * Add note in output header for change in default CRISPRessoPooled

        In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the
        default when running in mixed-mode. This is to allow for inexact alignments of
        the reads and the amplicons to the genome. For more context, see this issue
        https://github.com/pinellolab/CRISPResso2/issues/276

        * Clarify the verbosity parameter help message

        * Separate out parameters to `normalize_name` in CRISPRessoCORE

        * Separate out parameters to `normalize_name` in CRISPRessoWGS

        * Separate out parameters to `normalize_name` in CRISPRessoPooled

        * Separate out parameters to `normalize_name` in CRISPRessoCompare

        * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name`

        * Refactor `run_crispresso_cmds` to not require a `logger`

        This commit implements the functionality to make the `logger` object optional by
        seeing which module called the `run_crispresso_cmds` function and obtaining the
        correct object from that module name.

        The function also immediately returns when no commands are passed to it.

        * Add amplicon name to plotting debug statements in CRISPRessoCORE

        ---------

        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Samuel Nichols <[email protected]>

    commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jan 26 15:27:27 2023 -0500

        CRISPRessoPooled custom header fix (#278)

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 104866e1080c973bb025d1a5ba59b19dca1658af
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 14:00:26 2023 -0700

        Fix deprecated numpy type names (fixes #269) (#270)

        In the most recent version of numpy (1.24) some of the types have been
        deprecated. This commit fixes these errors.

    commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 06:49:35 2023 -0700

        Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274)

        I have suffered enough trying to debug my installation, so hopefully this helps
        someone else.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b9851e98104602eb78c2b384105267624295e9d3
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 22 13:30:23 2022 -0700

        Fix bug when pooled bam is input (#265)

        This change checks to see if a bam file was input, and if so it doesn't try to
        remove any intermediate files because there aren't any.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b822612642043e75a19042941f69b457ce51f517
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 15:26:45 2022 -0500

        Delete vscode settings

    commit b99aa624dec68ef7d19264340ce0cafa829625f4
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:29:14 2022 -0500

        Clarify input param help for pooled bam

    commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:28:54 2022 -0500

        Fix #235 - Cigar string is * if read unaligned

        Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235.

    commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 8 13:48:17 2022 -0700

        Add deprecation notice (#260)

        * Add FLASh and Trimmomatic deprecation notice to CLI output

        * Add Edilytics email address to CLI output

    commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a
    Author: Kendell Clement <[email protected]>
    Date:   Tue Dec 6 12:16:19 2022 -0500

        Format filterReadsOnSequencePresence script

    commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e
    Author: Kendell Clement <[email protected]>
    Date:   Fri Dec 2 22:12:54 2022 -0500

        Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string

    commit 9ddea40f7f02b546941ddaa4c71fc5283075051a
    Author: kclem <[email protected]>
    Date:   Mon Nov 14 10:33:04 2022 -0500

        Add check for prime editing extension sequence in prime edited sequence

        if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence.

    commit 152f2dd5001da7090641ee8a1326bde9f7e8104e
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:53:41 2022 -0500

        Version bump to 2.2.11a

    commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:47:30 2022 -0500

        Add param to override prime editing sequence checks

        CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.'

    commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 10:06:51 2022 -0500

        Update filterReadsOnSequencePresence.py

    commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb
    Author: Kendell Clement <[email protected]>
    Date:   Mon Nov 7 22:25:16 2022 -0500

        Add script to filter input based on sequence presence

    commit 713e57a19c35180035ca35e11a5820065eda0198
    Author: Kendell Clement <[email protected]>
    Date:   Tue Oct 18 16:02:26 2022 -0400

        Allow spaces in read names for CRISPRessoWGS

    commit 39ce008bdddccdd8229c0ba185dce78bc2f66968
    Author: Cole Lyman <[email protected]>
    Date:   Sat Oct 8 21:09:58 2022 -0600

        Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250)

    commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Oct 8 23:08:47 2022 -0400

        Batch amplicon plots (#251)

        * Error out if HDR amplicon matches existing amplicon

        * Add check for amplicon sequence uniqueness

        * Fix bug with bam_input not having bam_output

        * Test for no returned lines in auto mode, version bump to 2.2.11

        * Fix pandas deprecation of df.append

    commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8
    Author: Kendell Clement <[email protected]>
    Date:   Thu Oct 6 16:32:02 2022 -0400

        Fix CRISPRessoBatch plot pool bug when plots are suppressed

    commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 21 21:04:51 2022 -0600

        Fix batch quilt plot name (#249)

        This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch.

    commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 15 15:49:08 2022 -0400

        Version bump to 2.2.10

    commit c5f79aebfc1ae209f4ee320df250eed89a02787c
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 14 14:24:55 2022 -0600

        Parallel plot refactor (#247)

        * Fix duplicate plotting in CRISPRessoBatch aggregate

        * Refactor mulltiprocessing plots in CRISPRessoBatch

        * Refactor multiprocessing plots in CRISPRessoCORE

        * Refactor multiprocessing plots for CRISPRessoAggregate

    commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 13 14:12:11 2022 -0400

        print files in curr dir if Aggregate can't find files

    commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0
    Author: Kendell Clement <[email protected]>
    Date:   Mon Sep 12 10:32:57 2022 -0400

        Spelling typo

    commit c15f01c75083403f17c58c121b2afe97e9f2a1ec
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 6 17:49:52 2022 -0400

        Add helper function to create alignment scoring matrix

        New scoring matrix can be created using CRISPResso2Align.make_matrix()

    commit c80f82838c5a228b79ad4484092877cfee08e02c
    Author: Cole Lyman <[email protected]>
    Date:   Mon Aug 22 18:28:33 2022 -0600

        Add `zip_output` (#240)

        * Making zip of results

        * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping

        * Adding --zip to compare and pooled/wgs compare

        * Add more formatting changes to CRISPRessoShared

        * Refactoring propagate_crispress_options so only one version exists

        * Zip added to arguments_to_ignore and warning added when changing arguments

        * Restore styling

        * Update README to include --zip

        * Rename --zip to --zip_output

        * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE

        * Bug fix arg to args

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 5de3d7286d8e33c7cf4d3615fce715806e72f511
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:42:34 2022 -0400

        Fix fix to aggregate for CRISPRessoWGS

    commit a2294c266f43b14969a5d6474076f31a77a57173
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:40:50 2022 -0400

        Fix bug in aggregate for WGS

    commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3
    Author: Kendell Clement <[email protected]>
    Date:   Mon Aug 8 21:53:45 2022 -0400

        Update CRISPRessoWGS to allow non-word characters in region names

    commit 040ac0033d6e250f4e3a412101874cf5e914e08a
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 16:04:59 2022 -0400

        Enable processing of cram files by CRISPRessoWGS

        Adds --reference to samtools view when viewing cram files

    commit cf112a0caba8789e28530cc09171285ec6ea9b4c
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 14:55:46 2022 -0400

        Auto amplicon detection for interleaved input

        Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed.

    commit 4ba524dc7b947feca8a0f743837844f9febc2171
    Author: Cole Lyman <[email protected]>
    Date:   Thu Aug 4 11:32:11 2022 -0600

        Potential fix for aggregate plots in Batch mode (#237)

    commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 21 22:45:48 2022 -0400

        Fix pct_vectors in crispresso2_info json object

    commit 65a079d86d6f386793397398f839c46014b54543
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:46:37 2022 -0400

        Fix more readme spelling bugs

    commit e817376ecd54cdea1f29e303ca25b9e7d1d38333
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:42:23 2022 -0400

        Fix bug in readme spelling

    commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 16:10:09 2022 -0400

        Fix loading of crispresso info from WGS and Pooled

    commit b68a43271115251b18e8955e285ccc18f549e8cd
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:11:04 2022 -0400

        Add plotly to dockerfile

    commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:10:00 2022 -0400

        Fix #231 Allow N's in bam output (Try 2)

    commit c460b3e73fd06a230dbac2e37c86b833144ebf94
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:09:10 2022 -0400

        Revert "Fix #231 Allow N's in bam output"

        This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3.

    commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 13:52:37 2022 -0400

        Fix #231 Allow N's in bam output

    commit 0a2419e518dc9b3520058c3927f98b31cd51347e
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:10:01 2022 -0600

        Fix bug when name is provided instead of amplicon_name in pooled input file (#229)

        Also, raise an exception (instead of incorrectly executing) when there are not
        enough matched parameters in the pooled input file.

    commit cb58212379803788c04ca5793baaa760cbbeaa81
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:09:49 2022 -0600

        Fix bug when comparing two samples with the same name. (#228)

    commit e8a796f5f451409cbafed4404dfba4b6b8a124ca
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jun 23 21:30:23 2022 -0400

        Version bump to 2.2.9

    commit 632143ddedea48bab9229baeb4bf3ea4d1f658d6
    Author: Cole Lyman <[email protected]>
    Date:   Mon Jun 20 19:53:14 2022 -0600

        Don't run global frameshift plot when there are no reads (#226)

        When there are no reads (i.e. global_MODIFIED_FRAMESHIFT +
        global_MODIFIED_NON_FRAMESHIFT + global_NON_MODIFIED_NON_FRAMESHIFT == 0) there
        was a bug when trying to compute the pie chart, because all of the values in the
        pie chart are 0. This fix, will make sure that there is at least one read in
        order for the plot to bee constructed properly.

    commit 4bb06218e835d2624d53fd401542caef6f8a3a55
    Author: kclem <[email protected]>
    Date:   Fri Jun 3 16:57:02 2022 -0400

        Improvements for guide inference in 'auto' mode

        In 'auto' mode, a putative guide sequence is selected at the site of maximal editing.  If the site of maximal editing happens near the end of the guide (e.g. base 0) many things will break (e.g. quantification windows, etc). This update excludes bases from being used to find the guide using the --exclude_bp_from_left and --exclude_bp_from_right parameters. At default, these parameters are 15bp, so the first and last 15bp would not be selected for the site of maximal editing and thus be the site of a guide sequence. In addition, the site of maximal editing must have 3x the magnitude over the background.

    commit 9d64de187835b2553ad2b4374d32edab27f83645
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jun 2 20:22:25 2022 -0400

        Update README.md

    commit 6aafc5387986f5089ba55b68d128343d68052792
    Author: Simon P Shen <[email protected]>
    Date:   Tue May 31 17:42:53 2022 -0400

        directory in quotes in batch cmd (#222)

        Add quotes around output folder for folders that have spaces.

    commit 432f163ac68b9a650d1fd326171aadc505ee87f4
    Author: Kendell Clement <[email protected]>
    Date:   Tue May 24 23:38:36 2022 -0400

        CRISPRessoBatch fills NA values in batch settings

        NA values in CRISPRessoBatch are filled with the value from args - either the default value or the value from the command line args (if set)

    commit 6de774adbad3aa8cd99d07b0ba7692984b356cd4
    Author: kclem <[email protected]>
    Date:   Mon May 23 14:18:02 2022 -0400

        Fix file naming bug for HDR outputs

        In html file, figures 4e and 4f incorrectly referenced figure 4d. This fixes this bug.

    commit b88fec0668a4082a12ead3d26582e86d829dd7cc
    Author: Kendell Clement <[email protected]>
    Date:   Sat May 21 00:32:15 2022 -0400

        For bam_output, fix bug that wrote unaligned lines twice

    commit 3564e77ebcdedb4b01cc01dcca18ba3221fac67c
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 19 16:32:18 2022 -0400

        Update README with CRISPRessoPooled headers and bam_output parameters

    commit bc08d81f17cb1929d1c37a1773cffcf36fb12fe2
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 19 16:11:30 2022 -0400

        Add more links to tools

    commit 006c497a379ecd94b017a883a5db887861e1586a
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 19 16:08:14 2022 -0400

        Add links to tools

    commit dc8243373ad00d6bd467fc30c59942596ff0c5d6
    Author: Kendell Clement <[email protected]>
    Date:   Mon May 16 21:38:06 2022 -0400

        fastq_to_bam implementation (#219)

    commit e88b6833977c6b2768299e0b2e7af623e3a9ae7c
    Author: Kendell Clement <[email protected]>
    Date:   Sun May 8 02:14:13 2022 -0400

        Fix bug for when guides don't agree in CRISPRessoAggregate

    commit 7eb763116a8c60603f1cd654645215767ee8eb52
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 5 03:28:21 2022 -0400

        Fix bug for case of empty summary plots in report generation

    commit 0324fa67d14ed945f0c9531d9bcf73ebcf4ca042
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 5 03:28:02 2022 -0400

        Create report for number of significant bases in CRISPRessoCompare

    commit e3c9d0026a9ee6732f3ed6bdcf2a824850d7e66a
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 22:43:11 2022 -0400

        Update pickle to json in readme and CRISPRessoPooledWGSCompare

    commit 1553f7977c12bf1091a20ca55b878bccfb739b61
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 18:10:04 2022 -0400

        Merge pull request #4 from pinellolab/master (#218)

    commit bcecbfc047d294e26f381a6668e08cb4db24445c
    Merge: 15b0e05b bb13e007
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 18:06:37 2022 -0400

        Merge branch 'master' into master

    commit bb13e007738d6e7a4909e01f03daff592f334f36
    Merge: af4ab6e8 d0b41483
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 17:59:32 2022 -0400

        Merge branch 'master' of https://github.com/edilytics/CRISPResso2

    commit 15b0e05b9e03bbec5236e58776ddf9aa2f93180e
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 17:54:52 2022 -0400

        2 flexible pooled input (#217)

        * Batch type coerce and r2 file check

        * Upgrade tabs for bootstrap5

        * Update readme with additional pooled amplicon file headers

        Co-authored-by: Samuel Nichols <[email protected]>

    commit d0b41483bee704940ba60c58289f412b04c71659
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 13:43:43 2022 -0400

        Update README.md

    commit ce49fab5301cb73ba0daf6c765e350eb083c76f1
    Merge: 5f909713 b913fcb4
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 13:40:30 2022 -0400

        Merge pull request #3 from edilytics/2-flexible-pooled-input

        Add flexibility to CRISPRessoPooled amplicon input by allowing headers. Also, prime editing and quantification window coordinate parameters can be passed to CRISPRessoPooled.

    commit b913fcb402a8ba3106c3ff7913563a33d8d19fca
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 13:38:25 2022 -0400

        Update CRISPRessoPooledCORE.py

        Replace process to read header, increase flexibility for column order

    commit 945bf31f16530b7ce25b89095b2c7005bf146117
    Merge: 7b8f6788 5f909713
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 12:45:24 2022 -0400

        Merge branch 'master' into 2-flexible-pooled-input

    commit 5f9097133765736a7c2fe3c8e9b730845fed0b70
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 12:23:44 2022 -0400

        Version bump to 2.2.8

    commit c4a94ce0e06c6ebae13e128fbe6b708e635121c4
    Author: Kendell Clement <[email protected]>
    Date:   Wed May 4 00:13:17 2022 -0400

        Fix summary plot representation for multi reports

        *fixed old reference to make_multi_report which called old summary plot format
        * renamed summary_plot to summary_plots to reflect a dict with multiple plots

    commit 62900e9ae6fa37ce99a04f12a63ed5c912f75042
    Author: Cole Lyman <[email protected]>
    Date:   Tue May 3 20:47:52 2022 -0600

        Large aggregation (#192)

        * Squashed commit of the following:

        commit 8564eb03f0d9e62abf4b7528baf5c2ae296be8f9
        Merge: f6ef62c 07cc7d8
        Author: Kendell Clement <[email protected]>
        Date:   Tue Jan 11 16:20:15 2022 -0500

            Merge branch 'indel-alignment-fix' of https://github.com/edilytics/CRISPResso2 into indel-alignment-fix

        commit 07cc7d856ab3fcbbaa5381f17f29568192388887
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:29:59 2021 -0700

            Fix bug in `find_indels_substitutions`

            This bug occurred when there was a deletion at the end of a sequence, and was
            thus not properly accounted for.

        commit f6ef62cfdf909adac1b10ea86555cd218f8b2a74
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:29:59 2021 -0700

            Fix bug in `find_indels_substitutions`

            This bug occurred when there was a deletion at the end of a sequence, and was
            thus not properly accounted for.

        commit 7212f87f4be60057a6c848947ff6b5efde132a25
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:26:17 2021 -0700

            Add a unit test for `find_indels_substitutions`

            This unit test checks for deletions at the end of a sequence, which are
            inherently outside of the include_indx_set window.

        commit d50b4e903b973c71a275e31d470b40e59280ee13
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:03:22 2021 -0700

            Fix a bug in `find_indels_substitutions`

            The bug that this commit fixes is when an insertion occurs at the edge of the
            include indexes. The trouble with this earlier was that it was using the `idx`
            to calculate the size of the insertion, but the `idx` wasn't being incremented
            anymore because it was outside of the include window.

        commit 4db066f7bc333b7662a9232ac732ebb33ac3ace8
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:01:39 2021 -0700

            Add test case for `find_indels_substitutions`

            This test case is extracted from the CRISPRessoBatch integration test and
            provides an example where there is an insertion at the edge of the include
            index.

        commit 3b3a7417f5bbd6c2785a2af54a47e01d2e820451
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 11:37:07 2021 -0700

            Fix bug in CRISPRessoCompare where sample names were not properly set

            This was a place where it was (partially) missed during the crispresso2_info
            object refactoring.

        commit e9f5eff3d95b676b5ee2e23371a5604f600d34b2
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:26:17 2021 -0700

            Add a unit test for `find_indels_substitutions`

            This unit test checks for deletions at the end of a sequence, which are
            inherently outside of the include_indx_set window.

        commit d4d45a918254ab19a7e7956e9e731389c6f36ecb
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:03:22 2021 -0700

            Fix a bug in `find_indels_substitutions`

            The bug that this commit fixes is when an insertion occurs at the edge of the
            include indexes. The trouble with this earlier was that it was using the `idx`
            to calculate the size of the insertion, but the `idx` wasn't being incremented
            anymore because it was outside of the include window.

        commit 13f00bb40239c83e6e5cf844561fdb7000d3d9ab
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 15:01:39 2021 -0700

            Add test case for `find_indels_substitutions`

            This test case is extracted from the CRISPRessoBatch integration test and
            provides an example where there is an insertion at the edge of the include
            index.

        commit 659ae34e8fd106f7ecc163b5bea0b5a80ab0283c
        Author: Cole Lyman <[email protected]>
        Date:   Fri Dec 10 11:37:07 2021 -0700

            Fix bug in CRISPRessoCompare where sample names were not properly set

            This was a place where it was (partially) missed during the crispresso2_info
            object refactoring.

        * Add parameter `--suppress_batch_summary_plots`

        If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

        * Pep formatting cleanup

        * Add summary nucleotide plots to aggregate

        * Aggregate plots are paginated

        * Update CRISPRessoAggregateCORE.py

        Remove max sample limit for plotting

        * Add --max_samples_per_summary_plot to CRISPRessoAggregate

        Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

        * Add plotly function to plot an interactive heatmap

        * Fix deprecated numpy type to suppress warning

        * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

        These heatmaps are interactive (zoomable and panable) and show for each sample
        the percentage of insertions, substitutions, and deletions.

        * Add the heatmap summaries to the CRISPRessoAggregate report

        * Update Bootstrap to 5.1.3

        This is mainly so that we can use the fullscreen modal functionality in this version.

        * Move the plotly heatmaps to a Bootstrap modal

        * Fix bug where plots were not filling up entire modal.

        I have tried countless different ways for this to work, and this is the best
        that I can come up with. After the modal is opened it triggers the plot to
        resize, and then for some reason you need to trigger the resize event. I think
        this is because a `div` changing size won't actually trigger the resizing of the
        plot (and neither will just calling `Plotly.Plots.resize`...?!).

        * Update the axis labels and add autosize to plotly heatmaps

        I'm pretty sure the autosize doesn't do anything, but it is there for good
        measure.

        * Abandon attempts to make plots fullscreen

        This includes removing the Bootstrap modal (two out of the three plots would
        resize properly and I couldn't figure out a way to have the plot displayed
        outside of the modal). I have left in some javascript to make the plot
        fullscreen, but I couldn't get the formatting quite right and the plot wasn't
        much bigger in the fullscreen version because there was a ton of space between
        the plot and the heatmap. If some brave soul would like to tackle it, feel free!

        * Rename and refactor how plot data is passed around

        I have consolidated how the plot data is passed around, so that now you can pass
        in only one dict with all of the information instead of 4 or 5 separate
        parameters. I also renamed the `heatmap_plot_*` to
        `allele_modification_heatmap_*`.

        * Implement the line plot version of the modification percentages

        This also includes correctly resizing the plot when the line plot tab is
        selected!

        * Change default `max_samples_per_summary_plot` to be 150 instead of 250

        * Remove extra assignments of `this_number_samples` and suppress plot

        The plot that is suppressed is the large nucleotide quilt when there is a large
        number of samples. Is it okay to suppress this plot @kclem?

        * Implement parallel plotting in CRISPRessoAggregate

        * Fix sample indexing error and heatmap scaling for large number of samples

        * Add parameter `--suppress_batch_summary_plots`

        If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

        * Pep formatting cleanup

        * Add summary nucleotide plots to aggregate

        * Aggregate plots are paginated

        * Update CRISPRessoAggregateCORE.py

        Remove max sample limit for plotting

        * Add --max_samples_per_summary_plot to CRISPRessoAggregate

        Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

        * Add plotly function to plot an interactive heatmap

        * Fix deprecated numpy type to suppress warning

        * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

        These heatmaps are interactive (zoomable and panable) and show for each sample
        the percentage of insertions, substitutions, and deletions.

        * Add the heatmap summaries to the CRISPRessoAggregate report

        * Update Bootstrap to 5.1.3

        This is mainly so that we can use the fullscreen modal functionality in this version.

        * Move the plotly heatmaps to a Bootstrap modal

        * Fix bug where plots were not filling up entire modal.

        I have tried countless different ways for this to work, and this is the best
        that I can come up with. After the modal is opened it triggers the plot to
        resize, and then for some reason you need to trigger the resize event. I think
        this is because a `div` changing size won't actually trigger the resizing of the
        plot (and neither will just calling `Plotly.Plots.resize`...?!).

        * Update the axis labels and add autosize to plotly heatmaps

        I'm pretty sure the autosize doesn't do anything, but it is there for good
        measure.

        * Abandon attempts to make plots fullscreen

        This includes removing the Bootstrap modal (two out of the three plots would
        resize properly and I couldn't figure out a way to have the plot displayed
        outside of the modal). I have left in some javascript to make the plot
        fullscreen, but I couldn't get the formatting quite right and the plot wasn't
        much bigger in the fullscreen version because there was a ton of space between
        the plot and the heatmap. If some brave soul would like to tackle it, feel free!

        * Rename and refactor how plot data is passed around

        I have consolidated how the plot data is passed around, so that now you can pass
        in only one dict with all of the information instead of 4 or 5 separate
        parameters. I also renamed the `heatmap_plot_*` to
        `allele_modification_heatmap_*`.

        * Implement the line plot version of the modification percentages

        This also includes correctly resizing the plot when the line plot tab is
        selected!

        * Change default `max_samples_per_summary_plot` to be 150 instead of 250

        * Remove extra assignments of `this_number_samples` and suppress plot

        The plot that is suppressed is the large nucleotide quilt when there is a large
        number of samples. Is it okay to suppress this plot @kclem?

        * Implement parallel plotting in CRISPRessoAggregate

        * Fix sample indexing error and heatmap scaling for large number of samples

        * Add plotly requrement to setup.py

        * Remove space around vertical barcharts

        * Add scrollbar to long images in multiReport

        * Fill in default (empty) values to allele modification plots

        When not running CRISPRessoAggregate, default values for the
        `allele_modification_heatmap_plot` and `allele_modification_lin_plot`
        dictionaries will be set so that the template can be properly rendered.

        * Include CRISPRessoBatch in the refactor of how summary_plot dicts are handled

        * Update dockerfile for new docker

        * minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

        * Allow for flexible parsing of quant window coordinates

        * CRISPRessoPooled debug flash command, fix pep formatting

        * Set flexiguide homology parameter type to int

        * Coerce ints in batch file checking (#200)

        * Batch type coerce and r2 file check

        * Revert "Batch type coerce and r2 file check"

        This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

        * Coerce int values

        * Handle multiple qwcs in batch mode

        If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

        * Fix bug from old pandas for int cols

        Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

        * Create allele modification heatmaps and line plots in CRISPRessoBatch

        * Add allele modification heatmaps and line plots to CRISPRessoBatch

        * Make all plots in CRISPRessoBatch run in parallel

        * Make `--suppress_batch_summary_plots` store true

        Also, only open and shutdown the process pool when necessary.

        * Add blank values for allele_modification entries when not present

        Co-authored-by: Kendell Clement <[email protected]>
        Co-authored-by: dharjanto <[email protected]>
        Co-authored-by: Samuel Nichols <[email protected]>

    commit f67376fc9ab0e407d4086aa42fd1c77706ebc9c0
    Author: Kendell Clement <[email protected]>
    Date:   Fri Apr 15 00:46:30 2022 -0400

        Fix bug from old pandas for int cols

        Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

    commit b34fe2956ff88629809b2434878028723dfc4895
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 14 23:58:07 2022 -0400

        Handle multiple qwcs in batch mode

        If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

    commit c94e3b9f2e301bda91e9c1e6f4ef794b33b5dbf0
    Author: Samuel Nichols <[email protected]>
    Date:   Thu Apr 14 21:48:32 2022 -0600

        Coerce ints in batch file checking (#200)

        * Batch type coerce and r2 file check

        * Revert "Batch type coerce and r2 file check"

        This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

        * Coerce int values

    commit fc4542491bb86eb143db0044a848a56234403496
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 14 22:13:23 2022 -0400

        Set flexiguide homology parameter type to int

    commit 23fe2aa8e26067d1bcf36bfafc67e023c7588d2f
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 14 22:12:37 2022 -0400

        CRISPRessoPooled debug flash command, fix pep formatting

    commit d292d33d8c1fa3bfd2cee656643fd47bcdab161d
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 14 22:00:19 2022 -0400

        Allow for flexible parsing of quant window coordinates

    commit e1667cb53a7ea6fbb33369c8530a78639ed423ec
    Author: dharjanto <[email protected]>
    Date:   Mon Apr 11 22:08:21 2022 -0400

        minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

    commit 7b8f6788da18f6ab173fa3c3d10f4ab6bb2acc26
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Apr 8 10:21:00 2022 -0600

        Update README

    commit 9bc24cd0474ed9f398dff64274d3181c4b2f8637
    Author: Samuel Nichols <[email protected]>
    Date:   Tue Mar 29 11:25:09 2022 -0600

        Using Amplicon_Name

    commit 88ac5d72074b3da63de035e02c911ce34cd29414
    Merge: b6057a2d e5afa478
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Mar 28 22:32:09 2022 -0600

        Merge remote-tracking branch 'origin/master' into 2-flexible-pooled-input

    commit b6057a2d54cb8637ff0900416de8e2de72213f76
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Mar 28 20:53:05 2022 -0600

        Printing info statements for matched headers

    commit af4ab6e8507d7aa4b7b68f217a458e0d9c966f55
    Merge: bbb7d6f0 51a943c3
    Author: Cole Lyman <[email protected]>
    Date:   Fri Mar 25 09:44:13 2022 -0600

        Merge branch 'pinellolab:master' into master

    commit 3c1eb012fc02563e3e963f17a62c7e932f5bcddc
    Author: Samuel Nichols <[email protected]>
    Date:   Thu Mar 24 12:31:43 2022 -0600

        Debugging and column checking

    commit 0b47acbc592a6df6adf14641357b2104b76be691
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Mar 23 09:42:51 2022 -0600

        New variables added to pooled

    commit a0ff3a44d6d19d7b37f91919b5c0180206f72d53
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Mar 21 09:32:28 2022 -0600

        Read as string not bytes

    commit 710675fc3c0307e21103abd604315b47ff80a894
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Mar 16 13:51:30 2022 -0600

        Adding command building for new options

    commit f386818a48e5c840bd567611e6f1320c8146cac7
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Mar 16 10:08:33 2022 -0600

        Comment out df_template.iloc instance

    commit eb5e309da57c8b96cd760728ddbf67be05f30d1c
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Mar 16 09:59:19 2022 -0600

        Potential solution for flexible headers

    commit 51a943c3a8f8181963acc420e75a5e8ee103cf7c
    Author: Kendell Clement <[email protected]>
    Date:   Tue Mar 15 11:00:46 2022 -0400

        CRISPRessoPooled pep formatting and fix

        CRISPRessoPooled doesn't re-count reads if it has been run once and the `aligned_pooled_bam` is provided as input
        pep code formatting changes

    commit bbb7d6f0907aa13518d20e7f470e7de518b825f4
    Merge: ddbd39f0 5a10d638
    Author: Kendell Clement <[email protected]>
    Date:   Tue Mar 15 10:23:38 2022 -0400

        Merge branch 'master' of https://github.com/edilytics/CRISPResso2

    commit 5a10d638c638f21f8a2934955e92ef7e117b889e
    Author: Kendell Clement <[email protected]>
    Date:   Sat Feb 26 14:21:57 2022 -0500

        Move metadata for bam input and output

    commit e5afa4784d5330a1dc95c5deafcd9217edeac631
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Feb 16 10:20:24 2022 -0700

        Coerce int values

    commit ede7d85b50055311908000578c76a1860ae9de4d
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Feb 16 10:18:29 2022 -0700

        Revert "Batch type coerce and r2 file check"

        This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

    commit f91736688ea9739cf3063e3601c52ad6da1116a4
    Author: Samuel Nichols <[email protected]>
    Date:   Wed Feb 16 10:10:52 2022 -0700

        Batch type coerce and r2 file check

    commit 7b4a310b0f8b64c00e02eca3d522ad50d39b43ae
    Author: Kendell Clement <[email protected]>
    Date:   Tue Feb 15 22:18:05 2022 -0500

        Reiterate WGS region file is tab-separated

        Add note to WGS description that region file should be tab-separated. Closes #199

    commit b8497542e388ad401d0815d426f27abc3201a76d
    Author: kclem <[email protected]>
    Date:   Fri Feb 11 15:07:14 2022 -0500

        Extend x-axis to longest scaffold incorporation length

    commit ab7248947afade089809c74bfe6e9d5394e8f6dc
    Author: kclem <[email protected]>
    Date:   Wed Feb 9 17:05:11 2022 -0500

        Fix prime editing indexing for plots

    commit ddbd39f06b262d5ebd2cc69e116c08b22b6bd84e
    Merge: a7ffd468 442a48c7
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jan 13 15:35:36 2022 -0500

        Merge branch 'pinellolab:master' into master

 …
Snicker7 added a commit to edilytics/CRISPResso2 that referenced this pull request Apr 4, 2024
commit 22fc03183a8070c30dfb74d5c23575ac19019855
Author: Samuel Nichols <[email protected]>
Date:   Fri Jan 12 08:54:01 2024 -0700

    Add guardrail partial

commit e55f6b21972b578261bc5a864ce1d653d98f9e34
Author: Samuel Nichols <[email protected]>
Date:   Mon Jan 8 07:50:59 2024 -0700

    Functional guardrails, needs reports update

commit 6e968e9699ed59a47d88191d03768e042d8b60a4
Merge: 32b49685 e948ce10
Author: Samuel Nichols <[email protected]>
Date:   Mon Dec 18 13:34:36 2023 -0700

    Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history

commit 32b49685da320501dad2b0ebbb57887b66220ba8
Author: Samuel Nichols <[email protected]>
Date:   Fri Dec 15 15:27:04 2023 -0700

    Include guardrail functions

commit 4e309cf6f732565d635de3d4c5d074ada3027e2d
Author: Cole Lyman <[email protected]>
Date:   Mon Dec 18 10:51:55 2023 -0700

    Refactor to use CRISPRessoReports module

commit e648dc087c0055bc5d2fca13c64071a371dea941
Author: Cole Lyman <[email protected]>
Date:   Mon Dec 18 10:51:11 2023 -0700

    Add CRISPRessoReports subtree

commit e948ce107ebb0d1d99010ed12e937f34b5e607d4
Author: Samuel Nichols <[email protected]>
Date:   Fri Dec 15 15:27:04 2023 -0700

    Include guardrail functions

commit d33c748871a625facfe8d792e29c77ab9779138f
Author: Kendell Clement <[email protected]>
Date:   Tue Nov 7 16:31:06 2023 -0700

    Include parameter --assign_ambiguous_alignments_to_first_reference in readme

commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc
Author: Kendell Clement <[email protected]>
Date:   Wed Oct 11 17:17:30 2023 -0600

    Enable quantification by sgRNA (#348)

    This PR includes:
    - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs']
    - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately.

    I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command:

    ```

    CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table  -n hdr3 -p max -gn guide1,guide2
    ```

    ```
    python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3
    ```

    This produces:
    ```
    Processed 25000 alleles
    Reference: Reference (2391/23415 modified reads)
            UNMODIFIED: 21024
            MODIFIED guide1: 2359
            MODIFIED guide2: 32
    Reference: HDR (856/1577 modified reads)
            UNMODIFIED: 721
            MODIFIED guide1: 854
            MODIFIED guide1 + guide2: 1
            MODIFIED guide2: 1
     ```

commit 2e3da02fdbed2fa8ae02a277763d65a502459827
Author: Cole Lyman <[email protected]>
Date:   Tue Oct 10 15:29:08 2023 -0600

    changed tuple to list for matplotlib change (#31) (#346)

    Co-authored-by: mbowcut2 <[email protected]>

commit cd3c332135fe4db0f9218e3d87263d5c65838ed9
Author: Kendell Clement <[email protected]>
Date:   Sun Oct 1 01:54:46 2023 -0600

    rename script to camel case

commit 7c719d65fb36ac7654db9040f226564ea28fcab9
Author: Kendell Clement <[email protected]>
Date:   Sun Oct 1 01:53:44 2023 -0600

    Add new script for counting high quality bases

commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f
Author: Kendell Clement <[email protected]>
Date:   Thu Sep 14 15:15:30 2023 -0600

    Prime editing alignment params (#336)

    Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty.

    CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences.

    The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs.

commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917
Author: Cole Lyman <[email protected]>
Date:   Thu Sep 7 16:43:30 2023 -0600

    Fix samtools piping (#325)

    * Remove samtools pipe stderr to stdout

    Sometimes some of the libraries that samtools depends on don't have the correct
    version information, and as such samtools will report this to stderr when run.
    Because we pipe the output of samtools, we expect it to be valid SAM format, but
    when these library version messages are reported, it breaks CRISPRessoWGS.

    * Remove extra spacing at end of lines and add missing comma in WGS

    * Log stderr from samtools in CRISPRessoWGS

commit 8feff4101f27406d9d88ace97d31a518276bff3f
Author: Cole Lyman <[email protected]>
Date:   Fri Sep 1 09:43:56 2023 -0600

    Replace link to CRISPResso schematic with raw URL in README (#329)

    * Replace link to CRISPResso schematic with raw URL

    * Add new lines to the beginning of unordered lists

commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 10 00:52:12 2023 -0600

    Try to unbreak CircleCI

commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 10 00:17:27 2023 -0600

    Center command line text messages

commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 10 00:17:07 2023 -0600

    Fix bug in prime-editing scaffold-incorporation plotting

    If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read.

commit 2b36a1a5c35e8a93516ce8baf464595615e0f402
Author: Kendell Clement <[email protected]>
Date:   Wed Aug 9 15:29:48 2023 -0600

    CRISPRessoPooled --compile_postrun_references bug fixes

commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db
Author: Kendell Clement <[email protected]>
Date:   Tue Aug 8 23:30:15 2023 -0600

    Fix missing ' in Pooled --demultiplex_only_at_amplicons

commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664
Author: Cole Lyman <[email protected]>
Date:   Mon Jul 24 10:47:46 2023 -0600

    Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316)

    * Make sorting stable

    * Including c files

    * Sort by #Reads instead of %Reads to avoid floating point errors

    ---------

    Co-authored-by: Samuel Nichols <[email protected]>

commit de05533b3511a84f3b6b14fc2ef64db041613261
Author: Cole Lyman <[email protected]>
Date:   Thu Jul 6 13:54:45 2023 -0600

    Fix multiprocessing lambda pickling (#311)

    * Fix running plots in parallel

    The reason the plots were running slower before this change is because I was
    calling the plot function, not passing it to `submit`. So it was essentially
    running in serial, but worse because it was still spinning up/down the
    processes.

    * Fix multiprocessing lambda pickling (#20)

    * Refactor process_futures to be a dict

    This makes debugging much easier because you can associate the arguments to the
    future with the results.

    * Fix the pickling error when running in multiprocessing

    Only top-level functions (not lambdas) can be pickled to use in multiprocessing
    pools, thus the lambdas are converted to a regular function.

    * Further fixes to pickling multiprocessing error (#21)

    * Refactor process_futures to be a dict

    This makes debugging much easier because you can associate the arguments to the
    future with the results.

    * Fix the pickling error when running in multiprocessing

    Only top-level functions (not lambdas) can be pickled to use in multiprocessing
    pools, thus the lambdas are converted to a regular function.

    * Use Counter instead of defaultdict in CRISPRessoCORE

    * Update process_futures to dict in Batch and Aggregate

commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74
Author: Kendell Clement <[email protected]>
Date:   Mon Jul 3 17:12:09 2023 -0600

    Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append

commit 7285da0e987b77b72c8885bb35940e0f50c146bd
Author: Kendell Clement <[email protected]>
Date:   Fri Jun 23 16:50:33 2023 -0600

    Fix print bug for invalid fastq

commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c
Author: kclem <[email protected]>
Date:   Wed Jun 21 16:03:48 2023 -0600

    Slugify before creating filename - replaces invalid characters in batch names with _

commit f97e29c67de4c80b8d6b9cf334f363be4b514ade
Author: Cole Lyman <[email protected]>
Date:   Wed Jun 21 14:43:43 2023 -0600

    Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307)

    * Add verbosity argument to CRISPRessoAggregate (#18)

    * Allow for amplicon and guide seqs to be some variant of NA in batch (#19)

    This was discovered when attempting to infer amplicon sequences in batch mode on
    the web interface, NAs were supplied for the amplicon sequences to the sub
    CRISPResso commands.

commit 32e1e9797da5c3033cdc588e92f06b8813961953
Author: Mark Clement <[email protected]>
Date:   Wed Jun 21 14:01:00 2023 -0600

    Allow for interrogation of overlapping sgRNA sites

commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93
Author: Kendell Clement <[email protected]>
Date:   Mon Jun 12 12:16:47 2023 -0600

    Check input fastq file format

    Asserts input format of fastq files - including if gzipped files are missing the gz suffix.

commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251
Author: Kendell Clement <[email protected]>
Date:   Mon Jun 5 13:41:55 2023 -0600

    Fix CRISPRessoArgParser

commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be
Author: Kendell Clement <[email protected]>
Date:   Mon Jun 5 13:29:31 2023 -0600

    Cosmetic updates for command-line use

    - version bump to 2.2.13
    - If no args are provided, the command line version will print out an abbreviated help message
    - parameters can be excluded from CRISPRessoArgParser

commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252
Author: Cole Lyman <[email protected]>
Date:   Thu May 11 14:31:47 2023 -0600

    Fix multiprocessing error, don't start pool when only using single thread (#302)

    * Update README to have consistent use of `--base_editor_output` (#16)

    * Add files via upload

    * Only start process pools when using multiple processes

    This is mainly to solve the issue when running on AWS Lambda, but this should
    improve single core performance overall.

    ---------

    Co-authored-by: Kendell Clement <[email protected]>

commit 92a705c939b370373a70cf6ae9f1616de33288b9
Author: Cole Lyman <[email protected]>
Date:   Thu May 11 14:31:06 2023 -0600

    Update `base_editor` parameters in README and add Plot Harness (#301)

    * Update README to have consistent use of `--base_editor_output` (#16)

    * Add files via upload

    ---------

    Co-authored-by: Kendell Clement <[email protected]>

commit 7d46c4490235df45c5546b1b470e4e6a99727031
Author: Cole Lyman <[email protected]>
Date:   Wed May 10 15:41:33 2023 -0600

    Clarify CRISPRessoWGS intended use (#303)

    * Update README to have consistent use of `--base_editor_output` (#16)

    * Add sample plotting jupyter notebook

    * Add clarifying info to CRISPRessoWGS description

    Clarify WGS usage

commit 833a701787bb47674b3e921c38cac6189c775cf7
Author: Kendell Clement <[email protected]>
Date:   Thu May 4 17:02:46 2023 -0400

    Remove debug print statements

commit 712eb2a11825e8d36f2870deb12b35486bd633fb
Author: Kendell Clement <[email protected]>
Date:   Thu May 4 16:40:07 2023 -0400

    Allow dashes in filenames resolve #73

commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5
Author: Kendell Clement <[email protected]>
Date:   Sat Apr 22 23:41:58 2023 -0400

    Raise exceptions from within futures in plot_pool

commit 7e807a60de2a9d18bccd034b87106ceaf7153338
Author: Kendell Clement <[email protected]>
Date:   Sat Apr 22 23:38:56 2023 -0400

    Fix future pandas indexing warning

    Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead"

commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9
Author: Cole Lyman <[email protected]>
Date:   Thu Apr 20 13:59:27 2023 -0600

    Remove debug print statements fixes #295 (#297)

    The format string option used here is only available in Python version >=3.8.

commit 478c06f784603e96d20f96e91993fdcc4ac35c8a
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 13 12:09:26 2023 -0400

    Update plotCustomAllelePlot.py script for #292 (#293)

    Update type of 'max_rows' param to int
    Fix location of 'args' in crispresso2_info object

commit bcdae39e05d530f4a4e78738c3b30f7664981919
Author: Kendell Clement <[email protected]>
Date:   Mon Mar 27 13:18:34 2023 -0400

    Update pooled parameter format

commit 546446e36e7e68b527767d6c31ec341a49df2059
Author: Kendell Clement <[email protected]>
Date:   Tue Feb 14 16:26:23 2023 -0500

    Fix running plots in parallel (#286)

    The reason the plots were running slower before this change is because I was
    calling the plot function, not passing it to `submit`. So it was essentially
    running in serial, but worse because it was still spinning up/down the
    processes.

    Co-authored-by: Cole Lyman <[email protected]>

commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1
Author: kclem <[email protected]>
Date:   Fri Feb 10 15:45:15 2023 -0500

    Fix #283 to avoid filename collisions

    Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29

commit e577318006cd17b2725bd028e5e56634c6eb829a
Author: kclem <[email protected]>
Date:   Mon Feb 6 16:37:25 2023 -0500

    Case-insensitive headers accepted in CRISPRessoPooled

commit d34927620a4a6126a9988b3041e76f60728abbfe
Author: Kendell Clement <[email protected]>
Date:   Tue Jan 31 13:48:33 2023 -0500

    Fix print statement in CORE

commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5
Author: Kendell Clement <[email protected]>
Date:   Tue Jan 31 13:22:51 2023 -0500

    Version bump to 2.2.12

commit 1d4679c72d0c8b4154317c9aff5179217198e2d7
Author: Kendell Clement <[email protected]>
Date:   Tue Jan 31 13:01:31 2023 -0500

    Status Updates + Pooled Mixed Mode Update (#279)

    * Implement logging handler to overwrite the latest log status to file

    * Add StatusHandler to CRISPRessoCORE log

    This will take the latest log output and write it to a file (`status.txt`), the
    catch being that with each log the file is overwritten so that one can easily
    tell where CRISPResso currently is and what the error is (if any). These changes
    include some slight refactoring in order to accomodate any potential parameter
    exceptions.

    * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn`

    * Add StatusHandler to CRISPRessoPooled and a little refactoring

    * Implement `percent_complete` to the status log

    * Add StatusHandler to CRISPRessoAggregate log

    * Add StatusHandler to CRISPRessoCompare log

    * Add StatusHandler to CRISPRessoPooledWGSCompare log

    * Add StatusHandler to CRISPRessoWGS log

    * Rename `status.txt` to `CRISPResso_status.txt`

    * Modify status log names to match the tool they are generated from

    * Add percent_complete stages to CRISPRessoCORE

    These also include log statements of each plot that is being generated as well
    as fixing some variable name collisions with `ind`.

    * Format the percentage in the log to be 2 decimal places

    * Change all plotting logs from `info` to `debug` and simplify progress

    This refactors how the progress of the plots is calculated, making it much
    simplier. Before this change we would of had to keep track of the number of
    times `percent_complete` was output, but now it simply updates the percent
    complete after each amplicon is finished processing. Hopefully this will make
    things easier to mantain even though it will be a little less "accurate" (not
    sure how accurate the original implementation was...).

    * Implemented shared console log handler across all CRISPResso* calls

    This allows for easy changes to logging formatting, which was inspired by having
    to change the default logging level. The default logging level needs to be set
    at `logging.DEBUG` in order for the debug log statements to not be ignored for
    the running and status logs.

    * Add ability to set the verbosity level to each CRISPResso* tool

    This allows users to set a verbosity level between 1 and 4 using the
    `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the
    level will default to 4, being the most verbose.

    * Implement showing the last seen `percent_compelte` when none is provided

    * Keep track of and log when multiple parallel runs are completed

    These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that
    we can now display when a run is completed. This potentially breaks how
    signals and interupts are handled with multiple runs happening, but this needs
    to be reviewed.

    * Add debug and percentage complete to CRISPRessoBatch

    * Add percent complete to CRISPRessoPooled

    * Add debug and percent_complete message to CRISPRessoAggregate

    * Add `percent_complete` to CRISPRessoCompare

    * Add `percent_complete` to CRISPRessoPooledWGSCompare

    * Add status and `percent_complete` to CRISPRessoMeta

    * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare

    * Fixing documentation to match pooled headers

    * Header removal bug fix change documentation to guide_seq

    * Update documentation and help feature for CRISPRessoPooled

    * Remove extra newlines from CRISPRessoPooled -h

    * Make variable names as clear as my firstborn child's name

    * Update one more variable name

    * Fix bug to flow CRISPRessoPooled options to sub command

    * Make amplicon file args variable name clear

    * Update how parameters are set and retrieved from parameter object

    The refactor in the previous commit changed the type of the arguments to a
    dictionary which doesn't have the parameters as attributes, and this commit
    fixes that error.

    * Add note in output header for change in default CRISPRessoPooled

    In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the
    default when running in mixed-mode. This is to allow for inexact alignments of
    the reads and the amplicons to the genome. For more context, see this issue
    https://github.com/pinellolab/CRISPResso2/issues/276

    * Clarify the verbosity parameter help message

    * Separate out parameters to `normalize_name` in CRISPRessoCORE

    * Separate out parameters to `normalize_name` in CRISPRessoWGS

    * Separate out parameters to `normalize_name` in CRISPRessoPooled

    * Separate out parameters to `normalize_name` in CRISPRessoCompare

    * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name`

    * Refactor `run_crispresso_cmds` to not require a `logger`

    This commit implements the functionality to make the `logger` object optional by
    seeing which module called the `run_crispresso_cmds` function and obtaining the
    correct object from that module name.

    The function also immediately returns when no commands are passed to it.

    * Add amplicon name to plotting debug statements in CRISPRessoCORE

    ---------

    Co-authored-by: Cole Lyman <[email protected]>
    Co-authored-by: Cole Lyman <[email protected]>
    Co-authored-by: Cole Lyman <[email protected]>
    Co-authored-by: Samuel Nichols <[email protected]>

commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9
Author: Kendell Clement <[email protected]>
Date:   Thu Jan 26 15:27:27 2023 -0500

    CRISPRessoPooled custom header fix (#278)

    * Fixing documentation to match pooled headers

    * Header removal bug fix change documentation to guide_seq

    * Update documentation and help feature for CRISPRessoPooled

    * Remove extra newlines from CRISPRessoPooled -h

    * Make variable names as clear as my firstborn child's name

    * Update one more variable name

    Co-authored-by: Samuel Nichols <[email protected]>

commit 104866e1080c973bb025d1a5ba59b19dca1658af
Author: Cole Lyman <[email protected]>
Date:   Thu Jan 5 14:00:26 2023 -0700

    Fix deprecated numpy type names (fixes #269) (#270)

    In the most recent version of numpy (1.24) some of the types have been
    deprecated. This commit fixes these errors.

commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4
Author: Cole Lyman <[email protected]>
Date:   Thu Jan 5 06:49:35 2023 -0700

    Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274)

    I have suffered enough trying to debug my installation, so hopefully this helps
    someone else.

    Co-authored-by: Cole Lyman <[email protected]>

commit b9851e98104602eb78c2b384105267624295e9d3
Author: Cole Lyman <[email protected]>
Date:   Thu Dec 22 13:30:23 2022 -0700

    Fix bug when pooled bam is input (#265)

    This change checks to see if a bam file was input, and if so it doesn't try to
    remove any intermediate files because there aren't any.

    Co-authored-by: Cole Lyman <[email protected]>

commit b822612642043e75a19042941f69b457ce51f517
Author: Kendell Clement <[email protected]>
Date:   Mon Dec 19 15:26:45 2022 -0500

    Delete vscode settings

commit b99aa624dec68ef7d19264340ce0cafa829625f4
Author: Kendell Clement <[email protected]>
Date:   Mon Dec 19 13:29:14 2022 -0500

    Clarify input param help for pooled bam

commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04
Author: Kendell Clement <[email protected]>
Date:   Mon Dec 19 13:28:54 2022 -0500

    Fix #235 - Cigar string is * if read unaligned

    Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235.

commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b
Author: Cole Lyman <[email protected]>
Date:   Thu Dec 8 13:48:17 2022 -0700

    Add deprecation notice (#260)

    * Add FLASh and Trimmomatic deprecation notice to CLI output

    * Add Edilytics email address to CLI output

commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a
Author: Kendell Clement <[email protected]>
Date:   Tue Dec 6 12:16:19 2022 -0500

    Format filterReadsOnSequencePresence script

commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e
Author: Kendell Clement <[email protected]>
Date:   Fri Dec 2 22:12:54 2022 -0500

    Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string

commit 9ddea40f7f02b546941ddaa4c71fc5283075051a
Author: kclem <[email protected]>
Date:   Mon Nov 14 10:33:04 2022 -0500

    Add check for prime editing extension sequence in prime edited sequence

    if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence.

commit 152f2dd5001da7090641ee8a1326bde9f7e8104e
Author: kclem <[email protected]>
Date:   Wed Nov 9 11:53:41 2022 -0500

    Version bump to 2.2.11a

commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d
Author: kclem <[email protected]>
Date:   Wed Nov 9 11:47:30 2022 -0500

    Add param to override prime editing sequence checks

    CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.'

commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b
Author: kclem <[email protected]>
Date:   Wed Nov 9 10:06:51 2022 -0500

    Update filterReadsOnSequencePresence.py

commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb
Author: Kendell Clement <[email protected]>
Date:   Mon Nov 7 22:25:16 2022 -0500

    Add script to filter input based on sequence presence

commit 713e57a19c35180035ca35e11a5820065eda0198
Author: Kendell Clement <[email protected]>
Date:   Tue Oct 18 16:02:26 2022 -0400

    Allow spaces in read names for CRISPRessoWGS

commit 39ce008bdddccdd8229c0ba185dce78bc2f66968
Author: Cole Lyman <[email protected]>
Date:   Sat Oct 8 21:09:58 2022 -0600

    Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250)

commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5
Author: Kendell Clement <[email protected]>
Date:   Sat Oct 8 23:08:47 2022 -0400

    Batch amplicon plots (#251)

    * Error out if HDR amplicon matches existing amplicon

    * Add check for amplicon sequence uniqueness

    * Fix bug with bam_input not having bam_output

    * Test for no returned lines in auto mode, version bump to 2.2.11

    * Fix pandas deprecation of df.append

commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8
Author: Kendell Clement <[email protected]>
Date:   Thu Oct 6 16:32:02 2022 -0400

    Fix CRISPRessoBatch plot pool bug when plots are suppressed

commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2
Author: Cole Lyman <[email protected]>
Date:   Wed Sep 21 21:04:51 2022 -0600

    Fix batch quilt plot name (#249)

    This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch.

commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e
Author: Kendell Clement <[email protected]>
Date:   Thu Sep 15 15:49:08 2022 -0400

    Version bump to 2.2.10

commit c5f79aebfc1ae209f4ee320df250eed89a02787c
Author: Cole Lyman <[email protected]>
Date:   Wed Sep 14 14:24:55 2022 -0600

    Parallel plot refactor (#247)

    * Fix duplicate plotting in CRISPRessoBatch aggregate

    * Refactor mulltiprocessing plots in CRISPRessoBatch

    * Refactor multiprocessing plots in CRISPRessoCORE

    * Refactor multiprocessing plots for CRISPRessoAggregate

commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2
Author: Kendell Clement <[email protected]>
Date:   Tue Sep 13 14:12:11 2022 -0400

    print files in curr dir if Aggregate can't find files

commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0
Author: Kendell Clement <[email protected]>
Date:   Mon Sep 12 10:32:57 2022 -0400

    Spelling typo

commit c15f01c75083403f17c58c121b2afe97e9f2a1ec
Author: Kendell Clement <[email protected]>
Date:   Tue Sep 6 17:49:52 2022 -0400

    Add helper function to create alignment scoring matrix

    New scoring matrix can be created using CRISPResso2Align.make_matrix()

commit c80f82838c5a228b79ad4484092877cfee08e02c
Author: Cole Lyman <[email protected]>
Date:   Mon Aug 22 18:28:33 2022 -0600

    Add `zip_output` (#240)

    * Making zip of results

    * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping

    * Adding --zip to compare and pooled/wgs compare

    * Add more formatting changes to CRISPRessoShared

    * Refactoring propagate_crispress_options so only one version exists

    * Zip added to arguments_to_ignore and warning added when changing arguments

    * Restore styling

    * Update README to include --zip

    * Rename --zip to --zip_output

    * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE

    * Bug fix arg to args

    Co-authored-by: Samuel Nichols <[email protected]>

commit 5de3d7286d8e33c7cf4d3615fce715806e72f511
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 11 21:42:34 2022 -0400

    Fix fix to aggregate for CRISPRessoWGS

commit a2294c266f43b14969a5d6474076f31a77a57173
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 11 21:40:50 2022 -0400

    Fix bug in aggregate for WGS

commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3
Author: Kendell Clement <[email protected]>
Date:   Mon Aug 8 21:53:45 2022 -0400

    Update CRISPRessoWGS to allow non-word characters in region names

commit 040ac0033d6e250f4e3a412101874cf5e914e08a
Author: kclem <[email protected]>
Date:   Mon Aug 8 16:04:59 2022 -0400

    Enable processing of cram files by CRISPRessoWGS

    Adds --reference to samtools view when viewing cram files

commit cf112a0caba8789e28530cc09171285ec6ea9b4c
Author: kclem <[email protected]>
Date:   Mon Aug 8 14:55:46 2022 -0400

    Auto amplicon detection for interleaved input

    Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed.

commit 4ba524dc7b947feca8a0f743837844f9febc2171
Author: Cole Lyman <[email protected]>
Date:   Thu Aug 4 11:32:11 2022 -0600

    Potential fix for aggregate plots in Batch mode (#237)

commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 21 22:45:48 2022 -0400

    Fix pct_vectors in crispresso2_info json object

commit 65a079d86d6f386793397398f839c46014b54543
Author: Kendell Clement <[email protected]>
Date:   Wed Jul 20 23:46:37 2022 -0400

    Fix more readme spelling bugs

commit e817376ecd54cdea1f29e303ca25b9e7d1d38333
Author: Kendell Clement <[email protected]>
Date:   Wed Jul 20 23:42:23 2022 -0400

    Fix bug in readme spelling

commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d
Author: Kendell Clement <[email protected]>
Date:   Wed Jul 20 16:10:09 2022 -0400

    Fix loading of crispresso info from WGS and Pooled

commit b68a43271115251b18e8955e285ccc18f549e8cd
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 14 14:11:04 2022 -0400

    Add plotly to dockerfile

commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 14 14:10:00 2022 -0400

    Fix #231 Allow N's in bam output (Try 2)

commit c460b3e73fd06a230dbac2e37c86b833144ebf94
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 14 14:09:10 2022 -0400

    Revert "Fix #231 Allow N's in bam output"

    This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3.

commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 14 13:52:37 2022 -0400

    Fix #231 Allow N's in bam output

commit 0a2419e518dc9b3520058c3927f98b31cd51347e
Author: Cole Lyman <[email protected]>
Date:   Fri Jul 8 21:10:01 2022 -0600

    Fix bug when name is provided instead of amplicon_name in pooled input file (#229)

    Also, raise an exception (instead of incorrectly executing) when there are not
    enough matched parameters in the pooled input file.

commit cb58212379803788c04ca5793baaa760cbbeaa81
Author: Cole Lyman <[email protected]>
Date:   Fri Jul 8 21:09:49 2022 -0600

    Fix bug when comparing two samples with the same name. (#228)

commit e8a796f5f451409cbafed4404dfba4b6b8a124ca
Author: Kendell Clement <[email protected]>
Date:   Thu Jun 23 21:30:23 2022 -0400

    Version bump to 2.2.9

commit 632143ddedea48bab9229baeb4bf3ea4d1f658d6
Author: Cole Lyman <[email protected]>
Date:   Mon Jun 20 19:53:14 2022 -0600

    Don't run global frameshift plot when there are no reads (#226)

    When there are no reads (i.e. global_MODIFIED_FRAMESHIFT +
    global_MODIFIED_NON_FRAMESHIFT + global_NON_MODIFIED_NON_FRAMESHIFT == 0) there
    was a bug when trying to compute the pie chart, because all of the values in the
    pie chart are 0. This fix, will make sure that there is at least one read in
    order for the plot to bee constructed properly.

commit 4bb06218e835d2624d53fd401542caef6f8a3a55
Author: kclem <[email protected]>
Date:   Fri Jun 3 16:57:02 2022 -0400

    Improvements for guide inference in 'auto' mode

    In 'auto' mode, a putative guide sequence is selected at the site of maximal editing.  If the site of maximal editing happens near the end of the guide (e.g. base 0) many things will break (e.g. quantification windows, etc). This update excludes bases from being used to find the guide using the --exclude_bp_from_left and --exclude_bp_from_right parameters. At default, these parameters are 15bp, so the first and last 15bp would not be selected for the site of maximal editing and thus be the site of a guide sequence. In addition, the site of maximal editing must have 3x the magnitude over the background.

commit 9d64de187835b2553ad2b4374d32edab27f83645
Author: Kendell Clement <[email protected]>
Date:   Thu Jun 2 20:22:25 2022 -0400

    Update README.md

commit 6aafc5387986f5089ba55b68d128343d68052792
Author: Simon P Shen <[email protected]>
Date:   Tue May 31 17:42:53 2022 -0400

    directory in quotes in batch cmd (#222)

    Add quotes around output folder for folders that have spaces.

commit 432f163ac68b9a650d1fd326171aadc505ee87f4
Author: Kendell Clement <[email protected]>
Date:   Tue May 24 23:38:36 2022 -0400

    CRISPRessoBatch fills NA values in batch settings

    NA values in CRISPRessoBatch are filled with the value from args - either the default value or the value from the command line args (if set)

commit 6de774adbad3aa8cd99d07b0ba7692984b356cd4
Author: kclem <[email protected]>
Date:   Mon May 23 14:18:02 2022 -0400

    Fix file naming bug for HDR outputs

    In html file, figures 4e and 4f incorrectly referenced figure 4d. This fixes this bug.

commit b88fec0668a4082a12ead3d26582e86d829dd7cc
Author: Kendell Clement <[email protected]>
Date:   Sat May 21 00:32:15 2022 -0400

    For bam_output, fix bug that wrote unaligned lines twice

commit 3564e77ebcdedb4b01cc01dcca18ba3221fac67c
Author: Kendell Clement <[email protected]>
Date:   Thu May 19 16:32:18 2022 -0400

    Update README with CRISPRessoPooled headers and bam_output parameters

commit bc08d81f17cb1929d1c37a1773cffcf36fb12fe2
Author: Kendell Clement <[email protected]>
Date:   Thu May 19 16:11:30 2022 -0400

    Add more links to tools

commit 006c497a379ecd94b017a883a5db887861e1586a
Author: Kendell Clement <[email protected]>
Date:   Thu May 19 16:08:14 2022 -0400

    Add links to tools

commit dc8243373ad00d6bd467fc30c59942596ff0c5d6
Author: Kendell Clement <[email protected]>
Date:   Mon May 16 21:38:06 2022 -0400

    fastq_to_bam implementation (#219)

commit e88b6833977c6b2768299e0b2e7af623e3a9ae7c
Author: Kendell Clement <[email protected]>
Date:   Sun May 8 02:14:13 2022 -0400

    Fix bug for when guides don't agree in CRISPRessoAggregate

commit 7eb763116a8c60603f1cd654645215767ee8eb52
Author: Kendell Clement <[email protected]>
Date:   Thu May 5 03:28:21 2022 -0400

    Fix bug for case of empty summary plots in report generation

commit 0324fa67d14ed945f0c9531d9bcf73ebcf4ca042
Author: Kendell Clement <[email protected]>
Date:   Thu May 5 03:28:02 2022 -0400

    Create report for number of significant bases in CRISPRessoCompare

commit e3c9d0026a9ee6732f3ed6bdcf2a824850d7e66a
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 22:43:11 2022 -0400

    Update pickle to json in readme and CRISPRessoPooledWGSCompare

commit 1553f7977c12bf1091a20ca55b878bccfb739b61
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 18:10:04 2022 -0400

    Merge pull request #4 from pinellolab/master (#218)

commit bcecbfc047d294e26f381a6668e08cb4db24445c
Merge: 15b0e05b bb13e007
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 18:06:37 2022 -0400

    Merge branch 'master' into master

commit bb13e007738d6e7a4909e01f03daff592f334f36
Merge: af4ab6e8 d0b41483
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 17:59:32 2022 -0400

    Merge branch 'master' of https://github.com/edilytics/CRISPResso2

commit 15b0e05b9e03bbec5236e58776ddf9aa2f93180e
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 17:54:52 2022 -0400

    2 flexible pooled input (#217)

    * Batch type coerce and r2 file check

    * Upgrade tabs for bootstrap5

    * Update readme with additional pooled amplicon file headers

    Co-authored-by: Samuel Nichols <[email protected]>

commit d0b41483bee704940ba60c58289f412b04c71659
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 13:43:43 2022 -0400

    Update README.md

commit ce49fab5301cb73ba0daf6c765e350eb083c76f1
Merge: 5f909713 b913fcb4
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 13:40:30 2022 -0400

    Merge pull request #3 from edilytics/2-flexible-pooled-input

    Add flexibility to CRISPRessoPooled amplicon input by allowing headers. Also, prime editing and quantification window coordinate parameters can be passed to CRISPRessoPooled.

commit b913fcb402a8ba3106c3ff7913563a33d8d19fca
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 13:38:25 2022 -0400

    Update CRISPRessoPooledCORE.py

    Replace process to read header, increase flexibility for column order

commit 945bf31f16530b7ce25b89095b2c7005bf146117
Merge: 7b8f6788 5f909713
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 12:45:24 2022 -0400

    Merge branch 'master' into 2-flexible-pooled-input

commit 5f9097133765736a7c2fe3c8e9b730845fed0b70
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 12:23:44 2022 -0400

    Version bump to 2.2.8

commit c4a94ce0e06c6ebae13e128fbe6b708e635121c4
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 00:13:17 2022 -0400

    Fix summary plot representation for multi reports

    *fixed old reference to make_multi_report which called old summary plot format
    * renamed summary_plot to summary_plots to reflect a dict with multiple plots

commit 62900e9ae6fa37ce99a04f12a63ed5c912f75042
Author: Cole Lyman <[email protected]>
Date:   Tue May 3 20:47:52 2022 -0600

    Large aggregation (#192)

    * Squashed commit of the following:

    commit 8564eb03f0d9e62abf4b7528baf5c2ae296be8f9
    Merge: f6ef62c 07cc7d8
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 11 16:20:15 2022 -0500

        Merge branch 'indel-alignment-fix' of https://github.com/edilytics/CRISPResso2 into indel-alignment-fix

    commit 07cc7d856ab3fcbbaa5381f17f29568192388887
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:29:59 2021 -0700

        Fix bug in `find_indels_substitutions`

        This bug occurred when there was a deletion at the end of a sequence, and was
        thus not properly accounted for.

    commit f6ef62cfdf909adac1b10ea86555cd218f8b2a74
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:29:59 2021 -0700

        Fix bug in `find_indels_substitutions`

        This bug occurred when there was a deletion at the end of a sequence, and was
        thus not properly accounted for.

    commit 7212f87f4be60057a6c848947ff6b5efde132a25
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:26:17 2021 -0700

        Add a unit test for `find_indels_substitutions`

        This unit test checks for deletions at the end of a sequence, which are
        inherently outside of the include_indx_set window.

    commit d50b4e903b973c71a275e31d470b40e59280ee13
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:03:22 2021 -0700

        Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

    commit 4db066f7bc333b7662a9232ac732ebb33ac3ace8
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:01:39 2021 -0700

        Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

    commit 3b3a7417f5bbd6c2785a2af54a47e01d2e820451
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 11:37:07 2021 -0700

        Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

    commit e9f5eff3d95b676b5ee2e23371a5604f600d34b2
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:26:17 2021 -0700

        Add a unit test for `find_indels_substitutions`

        This unit test checks for deletions at the end of a sequence, which are
        inherently outside of the include_indx_set window.

    commit d4d45a918254ab19a7e7956e9e731389c6f36ecb
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:03:22 2021 -0700

        Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

    commit 13f00bb40239c83e6e5cf844561fdb7000d3d9ab
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:01:39 2021 -0700

        Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

    commit 659ae34e8fd106f7ecc163b5bea0b5a80ab0283c
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 11:37:07 2021 -0700

        Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

    * Add parameter `--suppress_batch_summary_plots`

    If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

    * Pep formatting cleanup

    * Add summary nucleotide plots to aggregate

    * Aggregate plots are paginated

    * Update CRISPRessoAggregateCORE.py

    Remove max sample limit for plotting

    * Add --max_samples_per_summary_plot to CRISPRessoAggregate

    Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

    * Add plotly function to plot an interactive heatmap

    * Fix deprecated numpy type to suppress warning

    * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

    These heatmaps are interactive (zoomable and panable) and show for each sample
    the percentage of insertions, substitutions, and deletions.

    * Add the heatmap summaries to the CRISPRessoAggregate report

    * Update Bootstrap to 5.1.3

    This is mainly so that we can use the fullscreen modal functionality in this version.

    * Move the plotly heatmaps to a Bootstrap modal

    * Fix bug where plots were not filling up entire modal.

    I have tried countless different ways for this to work, and this is the best
    that I can come up with. After the modal is opened it triggers the plot to
    resize, and then for some reason you need to trigger the resize event. I think
    this is because a `div` changing size won't actually trigger the resizing of the
    plot (and neither will just calling `Plotly.Plots.resize`...?!).

    * Update the axis labels and add autosize to plotly heatmaps

    I'm pretty sure the autosize doesn't do anything, but it is there for good
    measure.

    * Abandon attempts to make plots fullscreen

    This includes removing the Bootstrap modal (two out of the three plots would
    resize properly and I couldn't figure out a way to have the plot displayed
    outside of the modal). I have left in some javascript to make the plot
    fullscreen, but I couldn't get the formatting quite right and the plot wasn't
    much bigger in the fullscreen version because there was a ton of space between
    the plot and the heatmap. If some brave soul would like to tackle it, feel free!

    * Rename and refactor how plot data is passed around

    I have consolidated how the plot data is passed around, so that now you can pass
    in only one dict with all of the information instead of 4 or 5 separate
    parameters. I also renamed the `heatmap_plot_*` to
    `allele_modification_heatmap_*`.

    * Implement the line plot version of the modification percentages

    This also includes correctly resizing the plot when the line plot tab is
    selected!

    * Change default `max_samples_per_summary_plot` to be 150 instead of 250

    * Remove extra assignments of `this_number_samples` and suppress plot

    The plot that is suppressed is the large nucleotide quilt when there is a large
    number of samples. Is it okay to suppress this plot @kclem?

    * Implement parallel plotting in CRISPRessoAggregate

    * Fix sample indexing error and heatmap scaling for large number of samples

    * Add parameter `--suppress_batch_summary_plots`

    If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

    * Pep formatting cleanup

    * Add summary nucleotide plots to aggregate

    * Aggregate plots are paginated

    * Update CRISPRessoAggregateCORE.py

    Remove max sample limit for plotting

    * Add --max_samples_per_summary_plot to CRISPRessoAggregate

    Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

    * Add plotly function to plot an interactive heatmap

    * Fix deprecated numpy type to suppress warning

    * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

    These heatmaps are interactive (zoomable and panable) and show for each sample
    the percentage of insertions, substitutions, and deletions.

    * Add the heatmap summaries to the CRISPRessoAggregate report

    * Update Bootstrap to 5.1.3

    This is mainly so that we can use the fullscreen modal functionality in this version.

    * Move the plotly heatmaps to a Bootstrap modal

    * Fix bug where plots were not filling up entire modal.

    I have tried countless different ways for this to work, and this is the best
    that I can come up with. After the modal is opened it triggers the plot to
    resize, and then for some reason you need to trigger the resize event. I think
    this is because a `div` changing size won't actually trigger the resizing of the
    plot (and neither will just calling `Plotly.Plots.resize`...?!).

    * Update the axis labels and add autosize to plotly heatmaps

    I'm pretty sure the autosize doesn't do anything, but it is there for good
    measure.

    * Abandon attempts to make plots fullscreen

    This includes removing the Bootstrap modal (two out of the three plots would
    resize properly and I couldn't figure out a way to have the plot displayed
    outside of the modal). I have left in some javascript to make the plot
    fullscreen, but I couldn't get the formatting quite right and the plot wasn't
    much bigger in the fullscreen version because there was a ton of space between
    the plot and the heatmap. If some brave soul would like to tackle it, feel free!

    * Rename and refactor how plot data is passed around

    I have consolidated how the plot data is passed around, so that now you can pass
    in only one dict with all of the information instead of 4 or 5 separate
    parameters. I also renamed the `heatmap_plot_*` to
    `allele_modification_heatmap_*`.

    * Implement the line plot version of the modification percentages

    This also includes correctly resizing the plot when the line plot tab is
    selected!

    * Change default `max_samples_per_summary_plot` to be 150 instead of 250

    * Remove extra assignments of `this_number_samples` and suppress plot

    The plot that is suppressed is the large nucleotide quilt when there is a large
    number of samples. Is it okay to suppress this plot @kclem?

    * Implement parallel plotting in CRISPRessoAggregate

    * Fix sample indexing error and heatmap scaling for large number of samples

    * Add plotly requrement to setup.py

    * Remove space around vertical barcharts

    * Add scrollbar to long images in multiReport

    * Fill in default (empty) values to allele modification plots

    When not running CRISPRessoAggregate, default values for the
    `allele_modification_heatmap_plot` and `allele_modification_lin_plot`
    dictionaries will be set so that the template can be properly rendered.

    * Include CRISPRessoBatch in the refactor of how summary_plot dicts are handled

    * Update dockerfile for new docker

    * minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

    * Allow for flexible parsing of quant window coordinates

    * CRISPRessoPooled debug flash command, fix pep formatting

    * Set flexiguide homology parameter type to int

    * Coerce ints in batch file checking (#200)

    * Batch type coerce and r2 file check

    * Revert "Batch type coerce and r2 file check"

    This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

    * Coerce int values

    * Handle multiple qwcs in batch mode

    If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

    * Fix bug from old pandas for int cols

    Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

    * Create allele modification heatmaps and line plots in CRISPRessoBatch

    * Add allele modification heatmaps and line plots to CRISPRessoBatch

    * Make all plots in CRISPRessoBatch run in parallel

    * Make `--suppress_batch_summary_plots` store true

    Also, only open and shutdown the process pool when necessary.

    * Add blank values for allele_modification entries when not present

    Co-authored-by: Kendell Clement <[email protected]>
    Co-authored-by: dharjanto <[email protected]>
    Co-authored-by: Samuel Nichols <[email protected]>

commit f67376fc9ab0e407d4086aa42fd1c77706ebc9c0
Author: Kendell Clement <[email protected]>
Date:   Fri Apr 15 00:46:30 2022 -0400

    Fix bug from old pandas for int cols

    Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

commit b34fe2956ff88629809b2434878028723dfc4895
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 14 23:58:07 2022 -0400

    Handle multiple qwcs in batch mode

    If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

commit c94e3b9f2e301bda91e9c1e6f4ef794b33b5dbf0
Author: Samuel Nichols <[email protected]>
Date:   Thu Apr 14 21:48:32 2022 -0600

    Coerce ints in batch file checking (#200)

    * Batch type coerce and r2 file check

    * Revert "Batch type coerce and r2 file check"

    This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

    * Coerce int values

commit fc4542491bb86eb143db0044a848a56234403496
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 14 22:13:23 2022 -0400

    Set flexiguide homology parameter type to int

commit 23fe2aa8e26067d1bcf36bfafc67e023c7588d2f
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 14 22:12:37 2022 -0400

    CRISPRessoPooled debug flash command, fix pep formatting

commit d292d33d8c1fa3bfd2cee656643fd47bcdab161d
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 14 22:00:19 2022 -0400

    Allow for flexible parsing of quant window coordinates

commit e1667cb53a7ea6fbb33369c8530a78639ed423ec
Author: dharjanto <[email protected]>
Date:   Mon Apr 11 22:08:21 2022 -0400

    minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

commit 7b8f6788da18f6ab173fa3c3d10f4ab6bb2acc26
Author: Samuel Nichols <[email protected]>
Date:   Fri Apr 8 10:21:00 2022 -0600

    Update README

commit 9bc24cd0474ed9f398dff64274d3181c4b2f8637
Author: Samuel Nichols <[email protected]>
Date:   Tue Mar 29 11:25:09 2022 -0600

    Using Amplicon_Name

commit 88ac5d72074b3da63de035e02c911ce34cd29414
Merge: b6057a2d e5afa478
Author: Samuel Nichols <[email protected]>
Date:   Mon Mar 28 22:32:09 2022 -0600

    Merge remote-tracking branch 'origin/master' into 2-flexible-pooled-input

commit b6057a2d54cb8637ff0900416de8e2de72213f76
Author: Samuel Nichols <[email protected]>
Date:   Mon Mar 28 20:53:05 2022 -0600

    Printing info statements for matched headers

commit af4ab6e8507d7aa4b7b68f217a458e0d9c966f55
Merge: bbb7d6f0 51a943c3
Author: Cole Lyman <[email protected]>
Date:   Fri Mar 25 09:44:13 2022 -0600

    Merge branch 'pinellolab:master' into master

commit 3c1eb012fc02563e3e963f17a62c7e932f5bcddc
Author: Samuel Nichols <[email protected]>
Date:   Thu Mar 24 12:31:43 2022 -0600

    Debugging and column checking

commit 0b47acbc592a6df6adf14641357b2104b76be691
Author: Samuel Nichols <[email protected]>
Date:   Wed Mar 23 09:42:51 2022 -0600

    New variables added to pooled

commit a0ff3a44d6d19d7b37f91919b5c0180206f72d53
Author: Samuel Nichols <[email protected]>
Date:   Mon Mar 21 09:32:28 2022 -0600

    Read as string not bytes

commit 710675fc3c0307e21103abd604315b47ff80a894
Author: Samuel Nichols <[email protected]>
Date:   Wed Mar 16 13:51:30 2022 -0600

    Adding command building for new options

commit f386818a48e5c840bd567611e6f1320c8146cac7
Author: Samuel Nichols <[email protected]>
Date:   Wed Mar 16 10:08:33 2022 -0600

    Comment out df_template.iloc instance

commit eb5e309da57c8b96cd760728ddbf67be05f30d1c
Author: Samuel Nichols <[email protected]>
Date:   Wed Mar 16 09:59:19 2022 -0600

    Potential solution for flexible headers

commit 51a943c3a8f8181963acc420e75a5e8ee103cf7c
Author: Kendell Clement <[email protected]>
Date:   Tue Mar 15 11:00:46 2022 -0400

    CRISPRessoPooled pep formatting and fix

    CRISPRessoPooled doesn't re-count reads if it has been run once and the `aligned_pooled_bam` is provided as input
    pep code formatting changes

commit bbb7d6f0907aa13518d20e7f470e7de518b825f4
Merge: ddbd39f0 5a10d638
Author: Kendell Clement <[email protected]>
Date:   Tue Mar 15 10:23:38 2022 -0400

    Merge branch 'master' of https://github.com/edilytics/CRISPResso2

commit 5a10d638c638f21f8a2934955e92ef7e117b889e
Author: Kendell Clement <[email protected]>
Date:   Sat Feb 26 14:21:57 2022 -0500

    Move metadata for bam input and output

commit e5afa4784d5330a1dc95c5deafcd9217edeac631
Author: Samuel Nichols <[email protected]>
Date:   Wed Feb 16 10:20:24 2022 -0700

    Coerce int values

commit ede7d85b50055311908000578c76a1860ae9de4d
Author: Samuel Nichols <[email protected]>
Date:   Wed Feb 16 10:18:29 2022 -0700

    Revert "Batch type coerce and r2 file check"

    This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

commit f91736688ea9739cf3063e3601c52ad6da1116a4
Author: Samuel Nichols <[email protected]>
Date:   Wed Feb 16 10:10:52 2022 -0700

    Batch type coerce and r2 file check

commit 7b4a310b0f8b64c00e02eca3d522ad50d39b43ae
Author: Kendell Clement <[email protected]>
Date:   Tue Feb 15 22:18:05 2022 -0500

    Reiterate WGS region file is tab-separated

    Add note to WGS description that region file should be tab-separated. Closes #199

commit b8497542e388ad401d0815d426f27abc3201a76d
Author: kclem <[email protected]>
Date:   Fri Feb 11 15:07:14 2022 -0500

    Extend x-axis to longest scaffold incorporation length

commit ab7248947afade089809c74bfe6e9d5394e8f6dc
Author: kclem <[email protected]>
Date:   Wed Feb 9 17:05:11 2022 -0500

    Fix prime editing indexing for plots

commit ddbd39f06b262d5ebd2cc69e116c08b22b6bd84e
Merge: a7ffd468 442a48c7
Author: Kendell Clement <[email protected]>
Date:   Thu Jan 13 15:35:36 2022 -0500

    Merge branch 'pinellolab:master' into master

commit 442a48c7f4c62ec2ebc95fe268475e5e2a4b2f0c
Author: Cole Lyman <[email protected]>
Date:   Tue Jan 11 15:28:28 2022 -0700

    Indel alignment fix (#182)

    * Fix bug in CRISPRessoCompare where sample names were not properly set

    This was a place where it was (partially) missed during the crispresso2_info
    object refactoring.

    * Add test case for `find_indels_substitutions`

    This test case is extracted from the CRISPRessoBatch integration test and
    provides an example where there is an insertion at the edge of the include
    index.

    * Fix a bug in `find_indels_substitutions`

    The bug that this commit fixes is when an insertion occurs at the edge of the
    include indexes. The trouble with this earlier was that it was using the `idx`
    to calculate the size of the insertion, but the `idx` wasn't being incremented
    anymore because it was outside of the include window.

    * Add a unit test for `find_indels_substitutions`

    This unit test checks for deletions at the end of a sequence, which are
    inherently outside of the include_indx_set window.

    * Fix bug in CRISPRessoCompare where sample names were not properly set

    This was a place where it was (partially) missed during the crispresso2_info
    object refactoring.

    * Add test case for `find_indels_substitutions`

    This test case is extracted from the CRISPRessoBatch integration test and
    provides an example where there is an insertion at the edge of the include
    index.

    * Fix a bug in `find_indels_substitutions`

    The bug that this commit fixes is when an insertion occurs at the edge of the
    include indexes. The trouble with this earlier was that it was using the `idx`
    to calculate the size of the insertion, but the `idx` wasn't being incremented
    anymore because it was outside of the include window.

    * Add a unit test for `find_indels_substitutions`

    This unit test checks for deletions at the end of a sequence, which are
    inherently outside of the include_indx_set window.

    * Fix bug in `find_indels_substitutions`

    This bug occurred when there was a deletion at the end of a sequence, and was
    thus not properly accounted for.

    * Fix bug in `find_indels_substitutions`

    This bug occurred when there was a deletion at the end of a sequence, and was
    thus not properly accounted for.

    * Squashed commit of the following:

    commit 8564eb03f0d9e62abf4b7528baf5c2ae296be8f9
    Merge: f6ef62c 07cc7d8
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 11 16:20:15 2022 -0500

        Merge branch 'indel-alignment-fix' of https://github.com/edilytics/CRISPResso2 into indel-alignment-fix

    commit 07cc7d856ab3fcbbaa5381f17f29568192388887
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:29:59 2021 -0700

        Fix bug in `find_indels_substitutions`

        This bug occurred when there was a deletion at the end of a sequence, and was
        thus not properly accounted for.

    commit f6ef62cfdf909adac1b10ea86555cd218f8b2a74
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:29:59 2021 -0700

        Fix bug in `find_indels_substitutions`

        This bug occurred when there was a deletion at the end of a sequence, and was
        thus not properly accounted for.

    commit 7212f87f4be60057a6c848947ff6b5efde132a25
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:26:17 2021 -0700

        Add a unit test for `find_indels_substitutions`

        This unit test checks for deletions at the end of a sequence, which are
        inherently outside of the include_indx_set window.

    commit d50b4e903b973c71a275e31d470b40e59280ee13
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:03:22 2021 -0700

        Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

    commit 4db066f7bc333b7662a9232ac732ebb33ac3ace8
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:01:39 2021 -0700

        Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

    commit 3b3a7417f5bbd6c2785a2af54a47e01d2e820451
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 11:37:07 2021 -0700

        Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

    commit e9f5eff3d95b676b5ee2e23371a5604f600d34b2
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:26:17 2021 -0700

        Add a unit test for `find_indels_substitutions`

        This unit test checks for deletions at the end of a sequence, which are
        inherently outside of the include_indx_set window.

    commit d4d45a918254ab19a7e7956e9e731389c6f36ecb
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:03:22 2021 -0700

        Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

    commit 13f00bb40239c83e6e5cf844561fdb7000d3d9ab
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:01:39 2021 -0700

        Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

    commit 659ae34e8fd106f7ecc163b5bea0b5a80ab0283c
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 11:37:07 2021 -0700

        Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

    * Fix bug in `find_indels_substitutions`

    This bug occurred when there was a deletion at the end of a sequence, and was
    thus not properly accounted for.

    Co-authored-by: Kendell Clement <k.clement…
Snicker7 added a commit to edilytics/CRISPResso2 that referenced this pull request Apr 4, 2024
commit 603f2eff9d1aa21ae95f3e134da303b8018d3a33
Author: Samuel Nichols <[email protected]>
Date:   Fri Jan 12 09:48:20 2024 -0700

    fix guardrials partial

commit 22fc03183a8070c30dfb74d5c23575ac19019855
Author: Samuel Nichols <[email protected]>
Date:   Fri Jan 12 08:54:01 2024 -0700

    Add guardrail partial

commit e55f6b21972b578261bc5a864ce1d653d98f9e34
Author: Samuel Nichols <[email protected]>
Date:   Mon Jan 8 07:50:59 2024 -0700

    Functional guardrails, needs reports update

commit 6e968e9699ed59a47d88191d03768e042d8b60a4
Merge: 32b49685 e948ce10
Author: Samuel Nichols <[email protected]>
Date:   Mon Dec 18 13:34:36 2023 -0700

    Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history

commit 32b49685da320501dad2b0ebbb57887b66220ba8
Author: Samuel Nichols <[email protected]>
Date:   Fri Dec 15 15:27:04 2023 -0700

    Include guardrail functions

commit 4e309cf6f732565d635de3d4c5d074ada3027e2d
Author: Cole Lyman <[email protected]>
Date:   Mon Dec 18 10:51:55 2023 -0700

    Refactor to use CRISPRessoReports module

commit e648dc087c0055bc5d2fca13c64071a371dea941
Author: Cole Lyman <[email protected]>
Date:   Mon Dec 18 10:51:11 2023 -0700

    Add CRISPRessoReports subtree

commit e948ce107ebb0d1d99010ed12e937f34b5e607d4
Author: Samuel Nichols <[email protected]>
Date:   Fri Dec 15 15:27:04 2023 -0700

    Include guardrail functions

commit d33c748871a625facfe8d792e29c77ab9779138f
Author: Kendell Clement <[email protected]>
Date:   Tue Nov 7 16:31:06 2023 -0700

    Include parameter --assign_ambiguous_alignments_to_first_reference in readme

commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc
Author: Kendell Clement <[email protected]>
Date:   Wed Oct 11 17:17:30 2023 -0600

    Enable quantification by sgRNA (#348)

    This PR includes:
    - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs']
    - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately.

    I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command:

    ```

    CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table  -n hdr3 -p max -gn guide1,guide2
    ```

    ```
    python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3
    ```

    This produces:
    ```
    Processed 25000 alleles
    Reference: Reference (2391/23415 modified reads)
            UNMODIFIED: 21024
            MODIFIED guide1: 2359
            MODIFIED guide2: 32
    Reference: HDR (856/1577 modified reads)
            UNMODIFIED: 721
            MODIFIED guide1: 854
            MODIFIED guide1 + guide2: 1
            MODIFIED guide2: 1
     ```

commit 2e3da02fdbed2fa8ae02a277763d65a502459827
Author: Cole Lyman <[email protected]>
Date:   Tue Oct 10 15:29:08 2023 -0600

    changed tuple to list for matplotlib change (#31) (#346)

    Co-authored-by: mbowcut2 <[email protected]>

commit cd3c332135fe4db0f9218e3d87263d5c65838ed9
Author: Kendell Clement <[email protected]>
Date:   Sun Oct 1 01:54:46 2023 -0600

    rename script to camel case

commit 7c719d65fb36ac7654db9040f226564ea28fcab9
Author: Kendell Clement <[email protected]>
Date:   Sun Oct 1 01:53:44 2023 -0600

    Add new script for counting high quality bases

commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f
Author: Kendell Clement <[email protected]>
Date:   Thu Sep 14 15:15:30 2023 -0600

    Prime editing alignment params (#336)

    Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty.

    CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences.

    The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs.

commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917
Author: Cole Lyman <[email protected]>
Date:   Thu Sep 7 16:43:30 2023 -0600

    Fix samtools piping (#325)

    * Remove samtools pipe stderr to stdout

    Sometimes some of the libraries that samtools depends on don't have the correct
    version information, and as such samtools will report this to stderr when run.
    Because we pipe the output of samtools, we expect it to be valid SAM format, but
    when these library version messages are reported, it breaks CRISPRessoWGS.

    * Remove extra spacing at end of lines and add missing comma in WGS

    * Log stderr from samtools in CRISPRessoWGS

commit 8feff4101f27406d9d88ace97d31a518276bff3f
Author: Cole Lyman <[email protected]>
Date:   Fri Sep 1 09:43:56 2023 -0600

    Replace link to CRISPResso schematic with raw URL in README (#329)

    * Replace link to CRISPResso schematic with raw URL

    * Add new lines to the beginning of unordered lists

commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 10 00:52:12 2023 -0600

    Try to unbreak CircleCI

commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 10 00:17:27 2023 -0600

    Center command line text messages

commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 10 00:17:07 2023 -0600

    Fix bug in prime-editing scaffold-incorporation plotting

    If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read.

commit 2b36a1a5c35e8a93516ce8baf464595615e0f402
Author: Kendell Clement <[email protected]>
Date:   Wed Aug 9 15:29:48 2023 -0600

    CRISPRessoPooled --compile_postrun_references bug fixes

commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db
Author: Kendell Clement <[email protected]>
Date:   Tue Aug 8 23:30:15 2023 -0600

    Fix missing ' in Pooled --demultiplex_only_at_amplicons

commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664
Author: Cole Lyman <[email protected]>
Date:   Mon Jul 24 10:47:46 2023 -0600

    Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316)

    * Make sorting stable

    * Including c files

    * Sort by #Reads instead of %Reads to avoid floating point errors

    ---------

    Co-authored-by: Samuel Nichols <[email protected]>

commit de05533b3511a84f3b6b14fc2ef64db041613261
Author: Cole Lyman <[email protected]>
Date:   Thu Jul 6 13:54:45 2023 -0600

    Fix multiprocessing lambda pickling (#311)

    * Fix running plots in parallel

    The reason the plots were running slower before this change is because I was
    calling the plot function, not passing it to `submit`. So it was essentially
    running in serial, but worse because it was still spinning up/down the
    processes.

    * Fix multiprocessing lambda pickling (#20)

    * Refactor process_futures to be a dict

    This makes debugging much easier because you can associate the arguments to the
    future with the results.

    * Fix the pickling error when running in multiprocessing

    Only top-level functions (not lambdas) can be pickled to use in multiprocessing
    pools, thus the lambdas are converted to a regular function.

    * Further fixes to pickling multiprocessing error (#21)

    * Refactor process_futures to be a dict

    This makes debugging much easier because you can associate the arguments to the
    future with the results.

    * Fix the pickling error when running in multiprocessing

    Only top-level functions (not lambdas) can be pickled to use in multiprocessing
    pools, thus the lambdas are converted to a regular function.

    * Use Counter instead of defaultdict in CRISPRessoCORE

    * Update process_futures to dict in Batch and Aggregate

commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74
Author: Kendell Clement <[email protected]>
Date:   Mon Jul 3 17:12:09 2023 -0600

    Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append

commit 7285da0e987b77b72c8885bb35940e0f50c146bd
Author: Kendell Clement <[email protected]>
Date:   Fri Jun 23 16:50:33 2023 -0600

    Fix print bug for invalid fastq

commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c
Author: kclem <[email protected]>
Date:   Wed Jun 21 16:03:48 2023 -0600

    Slugify before creating filename - replaces invalid characters in batch names with _

commit f97e29c67de4c80b8d6b9cf334f363be4b514ade
Author: Cole Lyman <[email protected]>
Date:   Wed Jun 21 14:43:43 2023 -0600

    Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307)

    * Add verbosity argument to CRISPRessoAggregate (#18)

    * Allow for amplicon and guide seqs to be some variant of NA in batch (#19)

    This was discovered when attempting to infer amplicon sequences in batch mode on
    the web interface, NAs were supplied for the amplicon sequences to the sub
    CRISPResso commands.

commit 32e1e9797da5c3033cdc588e92f06b8813961953
Author: Mark Clement <[email protected]>
Date:   Wed Jun 21 14:01:00 2023 -0600

    Allow for interrogation of overlapping sgRNA sites

commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93
Author: Kendell Clement <[email protected]>
Date:   Mon Jun 12 12:16:47 2023 -0600

    Check input fastq file format

    Asserts input format of fastq files - including if gzipped files are missing the gz suffix.

commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251
Author: Kendell Clement <[email protected]>
Date:   Mon Jun 5 13:41:55 2023 -0600

    Fix CRISPRessoArgParser

commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be
Author: Kendell Clement <[email protected]>
Date:   Mon Jun 5 13:29:31 2023 -0600

    Cosmetic updates for command-line use

    - version bump to 2.2.13
    - If no args are provided, the command line version will print out an abbreviated help message
    - parameters can be excluded from CRISPRessoArgParser

commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252
Author: Cole Lyman <[email protected]>
Date:   Thu May 11 14:31:47 2023 -0600

    Fix multiprocessing error, don't start pool when only using single thread (#302)

    * Update README to have consistent use of `--base_editor_output` (#16)

    * Add files via upload

    * Only start process pools when using multiple processes

    This is mainly to solve the issue when running on AWS Lambda, but this should
    improve single core performance overall.

    ---------

    Co-authored-by: Kendell Clement <[email protected]>

commit 92a705c939b370373a70cf6ae9f1616de33288b9
Author: Cole Lyman <[email protected]>
Date:   Thu May 11 14:31:06 2023 -0600

    Update `base_editor` parameters in README and add Plot Harness (#301)

    * Update README to have consistent use of `--base_editor_output` (#16)

    * Add files via upload

    ---------

    Co-authored-by: Kendell Clement <[email protected]>

commit 7d46c4490235df45c5546b1b470e4e6a99727031
Author: Cole Lyman <[email protected]>
Date:   Wed May 10 15:41:33 2023 -0600

    Clarify CRISPRessoWGS intended use (#303)

    * Update README to have consistent use of `--base_editor_output` (#16)

    * Add sample plotting jupyter notebook

    * Add clarifying info to CRISPRessoWGS description

    Clarify WGS usage

commit 833a701787bb47674b3e921c38cac6189c775cf7
Author: Kendell Clement <[email protected]>
Date:   Thu May 4 17:02:46 2023 -0400

    Remove debug print statements

commit 712eb2a11825e8d36f2870deb12b35486bd633fb
Author: Kendell Clement <[email protected]>
Date:   Thu May 4 16:40:07 2023 -0400

    Allow dashes in filenames resolve #73

commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5
Author: Kendell Clement <[email protected]>
Date:   Sat Apr 22 23:41:58 2023 -0400

    Raise exceptions from within futures in plot_pool

commit 7e807a60de2a9d18bccd034b87106ceaf7153338
Author: Kendell Clement <[email protected]>
Date:   Sat Apr 22 23:38:56 2023 -0400

    Fix future pandas indexing warning

    Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead"

commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9
Author: Cole Lyman <[email protected]>
Date:   Thu Apr 20 13:59:27 2023 -0600

    Remove debug print statements fixes #295 (#297)

    The format string option used here is only available in Python version >=3.8.

commit 478c06f784603e96d20f96e91993fdcc4ac35c8a
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 13 12:09:26 2023 -0400

    Update plotCustomAllelePlot.py script for #292 (#293)

    Update type of 'max_rows' param to int
    Fix location of 'args' in crispresso2_info object

commit bcdae39e05d530f4a4e78738c3b30f7664981919
Author: Kendell Clement <[email protected]>
Date:   Mon Mar 27 13:18:34 2023 -0400

    Update pooled parameter format

commit 546446e36e7e68b527767d6c31ec341a49df2059
Author: Kendell Clement <[email protected]>
Date:   Tue Feb 14 16:26:23 2023 -0500

    Fix running plots in parallel (#286)

    The reason the plots were running slower before this change is because I was
    calling the plot function, not passing it to `submit`. So it was essentially
    running in serial, but worse because it was still spinning up/down the
    processes.

    Co-authored-by: Cole Lyman <[email protected]>

commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1
Author: kclem <[email protected]>
Date:   Fri Feb 10 15:45:15 2023 -0500

    Fix #283 to avoid filename collisions

    Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29

commit e577318006cd17b2725bd028e5e56634c6eb829a
Author: kclem <[email protected]>
Date:   Mon Feb 6 16:37:25 2023 -0500

    Case-insensitive headers accepted in CRISPRessoPooled

commit d34927620a4a6126a9988b3041e76f60728abbfe
Author: Kendell Clement <[email protected]>
Date:   Tue Jan 31 13:48:33 2023 -0500

    Fix print statement in CORE

commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5
Author: Kendell Clement <[email protected]>
Date:   Tue Jan 31 13:22:51 2023 -0500

    Version bump to 2.2.12

commit 1d4679c72d0c8b4154317c9aff5179217198e2d7
Author: Kendell Clement <[email protected]>
Date:   Tue Jan 31 13:01:31 2023 -0500

    Status Updates + Pooled Mixed Mode Update (#279)

    * Implement logging handler to overwrite the latest log status to file

    * Add StatusHandler to CRISPRessoCORE log

    This will take the latest log output and write it to a file (`status.txt`), the
    catch being that with each log the file is overwritten so that one can easily
    tell where CRISPResso currently is and what the error is (if any). These changes
    include some slight refactoring in order to accomodate any potential parameter
    exceptions.

    * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn`

    * Add StatusHandler to CRISPRessoPooled and a little refactoring

    * Implement `percent_complete` to the status log

    * Add StatusHandler to CRISPRessoAggregate log

    * Add StatusHandler to CRISPRessoCompare log

    * Add StatusHandler to CRISPRessoPooledWGSCompare log

    * Add StatusHandler to CRISPRessoWGS log

    * Rename `status.txt` to `CRISPResso_status.txt`

    * Modify status log names to match the tool they are generated from

    * Add percent_complete stages to CRISPRessoCORE

    These also include log statements of each plot that is being generated as well
    as fixing some variable name collisions with `ind`.

    * Format the percentage in the log to be 2 decimal places

    * Change all plotting logs from `info` to `debug` and simplify progress

    This refactors how the progress of the plots is calculated, making it much
    simplier. Before this change we would of had to keep track of the number of
    times `percent_complete` was output, but now it simply updates the percent
    complete after each amplicon is finished processing. Hopefully this will make
    things easier to mantain even though it will be a little less "accurate" (not
    sure how accurate the original implementation was...).

    * Implemented shared console log handler across all CRISPResso* calls

    This allows for easy changes to logging formatting, which was inspired by having
    to change the default logging level. The default logging level needs to be set
    at `logging.DEBUG` in order for the debug log statements to not be ignored for
    the running and status logs.

    * Add ability to set the verbosity level to each CRISPResso* tool

    This allows users to set a verbosity level between 1 and 4 using the
    `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the
    level will default to 4, being the most verbose.

    * Implement showing the last seen `percent_compelte` when none is provided

    * Keep track of and log when multiple parallel runs are completed

    These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that
    we can now display when a run is completed. This potentially breaks how
    signals and interupts are handled with multiple runs happening, but this needs
    to be reviewed.

    * Add debug and percentage complete to CRISPRessoBatch

    * Add percent complete to CRISPRessoPooled

    * Add debug and percent_complete message to CRISPRessoAggregate

    * Add `percent_complete` to CRISPRessoCompare

    * Add `percent_complete` to CRISPRessoPooledWGSCompare

    * Add status and `percent_complete` to CRISPRessoMeta

    * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare

    * Fixing documentation to match pooled headers

    * Header removal bug fix change documentation to guide_seq

    * Update documentation and help feature for CRISPRessoPooled

    * Remove extra newlines from CRISPRessoPooled -h

    * Make variable names as clear as my firstborn child's name

    * Update one more variable name

    * Fix bug to flow CRISPRessoPooled options to sub command

    * Make amplicon file args variable name clear

    * Update how parameters are set and retrieved from parameter object

    The refactor in the previous commit changed the type of the arguments to a
    dictionary which doesn't have the parameters as attributes, and this commit
    fixes that error.

    * Add note in output header for change in default CRISPRessoPooled

    In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the
    default when running in mixed-mode. This is to allow for inexact alignments of
    the reads and the amplicons to the genome. For more context, see this issue
    https://github.com/pinellolab/CRISPResso2/issues/276

    * Clarify the verbosity parameter help message

    * Separate out parameters to `normalize_name` in CRISPRessoCORE

    * Separate out parameters to `normalize_name` in CRISPRessoWGS

    * Separate out parameters to `normalize_name` in CRISPRessoPooled

    * Separate out parameters to `normalize_name` in CRISPRessoCompare

    * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name`

    * Refactor `run_crispresso_cmds` to not require a `logger`

    This commit implements the functionality to make the `logger` object optional by
    seeing which module called the `run_crispresso_cmds` function and obtaining the
    correct object from that module name.

    The function also immediately returns when no commands are passed to it.

    * Add amplicon name to plotting debug statements in CRISPRessoCORE

    ---------

    Co-authored-by: Cole Lyman <[email protected]>
    Co-authored-by: Cole Lyman <[email protected]>
    Co-authored-by: Cole Lyman <[email protected]>
    Co-authored-by: Samuel Nichols <[email protected]>

commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9
Author: Kendell Clement <[email protected]>
Date:   Thu Jan 26 15:27:27 2023 -0500

    CRISPRessoPooled custom header fix (#278)

    * Fixing documentation to match pooled headers

    * Header removal bug fix change documentation to guide_seq

    * Update documentation and help feature for CRISPRessoPooled

    * Remove extra newlines from CRISPRessoPooled -h

    * Make variable names as clear as my firstborn child's name

    * Update one more variable name

    Co-authored-by: Samuel Nichols <[email protected]>

commit 104866e1080c973bb025d1a5ba59b19dca1658af
Author: Cole Lyman <[email protected]>
Date:   Thu Jan 5 14:00:26 2023 -0700

    Fix deprecated numpy type names (fixes #269) (#270)

    In the most recent version of numpy (1.24) some of the types have been
    deprecated. This commit fixes these errors.

commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4
Author: Cole Lyman <[email protected]>
Date:   Thu Jan 5 06:49:35 2023 -0700

    Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274)

    I have suffered enough trying to debug my installation, so hopefully this helps
    someone else.

    Co-authored-by: Cole Lyman <[email protected]>

commit b9851e98104602eb78c2b384105267624295e9d3
Author: Cole Lyman <[email protected]>
Date:   Thu Dec 22 13:30:23 2022 -0700

    Fix bug when pooled bam is input (#265)

    This change checks to see if a bam file was input, and if so it doesn't try to
    remove any intermediate files because there aren't any.

    Co-authored-by: Cole Lyman <[email protected]>

commit b822612642043e75a19042941f69b457ce51f517
Author: Kendell Clement <[email protected]>
Date:   Mon Dec 19 15:26:45 2022 -0500

    Delete vscode settings

commit b99aa624dec68ef7d19264340ce0cafa829625f4
Author: Kendell Clement <[email protected]>
Date:   Mon Dec 19 13:29:14 2022 -0500

    Clarify input param help for pooled bam

commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04
Author: Kendell Clement <[email protected]>
Date:   Mon Dec 19 13:28:54 2022 -0500

    Fix #235 - Cigar string is * if read unaligned

    Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235.

commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b
Author: Cole Lyman <[email protected]>
Date:   Thu Dec 8 13:48:17 2022 -0700

    Add deprecation notice (#260)

    * Add FLASh and Trimmomatic deprecation notice to CLI output

    * Add Edilytics email address to CLI output

commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a
Author: Kendell Clement <[email protected]>
Date:   Tue Dec 6 12:16:19 2022 -0500

    Format filterReadsOnSequencePresence script

commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e
Author: Kendell Clement <[email protected]>
Date:   Fri Dec 2 22:12:54 2022 -0500

    Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string

commit 9ddea40f7f02b546941ddaa4c71fc5283075051a
Author: kclem <[email protected]>
Date:   Mon Nov 14 10:33:04 2022 -0500

    Add check for prime editing extension sequence in prime edited sequence

    if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence.

commit 152f2dd5001da7090641ee8a1326bde9f7e8104e
Author: kclem <[email protected]>
Date:   Wed Nov 9 11:53:41 2022 -0500

    Version bump to 2.2.11a

commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d
Author: kclem <[email protected]>
Date:   Wed Nov 9 11:47:30 2022 -0500

    Add param to override prime editing sequence checks

    CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.'

commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b
Author: kclem <[email protected]>
Date:   Wed Nov 9 10:06:51 2022 -0500

    Update filterReadsOnSequencePresence.py

commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb
Author: Kendell Clement <[email protected]>
Date:   Mon Nov 7 22:25:16 2022 -0500

    Add script to filter input based on sequence presence

commit 713e57a19c35180035ca35e11a5820065eda0198
Author: Kendell Clement <[email protected]>
Date:   Tue Oct 18 16:02:26 2022 -0400

    Allow spaces in read names for CRISPRessoWGS

commit 39ce008bdddccdd8229c0ba185dce78bc2f66968
Author: Cole Lyman <[email protected]>
Date:   Sat Oct 8 21:09:58 2022 -0600

    Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250)

commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5
Author: Kendell Clement <[email protected]>
Date:   Sat Oct 8 23:08:47 2022 -0400

    Batch amplicon plots (#251)

    * Error out if HDR amplicon matches existing amplicon

    * Add check for amplicon sequence uniqueness

    * Fix bug with bam_input not having bam_output

    * Test for no returned lines in auto mode, version bump to 2.2.11

    * Fix pandas deprecation of df.append

commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8
Author: Kendell Clement <[email protected]>
Date:   Thu Oct 6 16:32:02 2022 -0400

    Fix CRISPRessoBatch plot pool bug when plots are suppressed

commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2
Author: Cole Lyman <[email protected]>
Date:   Wed Sep 21 21:04:51 2022 -0600

    Fix batch quilt plot name (#249)

    This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch.

commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e
Author: Kendell Clement <[email protected]>
Date:   Thu Sep 15 15:49:08 2022 -0400

    Version bump to 2.2.10

commit c5f79aebfc1ae209f4ee320df250eed89a02787c
Author: Cole Lyman <[email protected]>
Date:   Wed Sep 14 14:24:55 2022 -0600

    Parallel plot refactor (#247)

    * Fix duplicate plotting in CRISPRessoBatch aggregate

    * Refactor mulltiprocessing plots in CRISPRessoBatch

    * Refactor multiprocessing plots in CRISPRessoCORE

    * Refactor multiprocessing plots for CRISPRessoAggregate

commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2
Author: Kendell Clement <[email protected]>
Date:   Tue Sep 13 14:12:11 2022 -0400

    print files in curr dir if Aggregate can't find files

commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0
Author: Kendell Clement <[email protected]>
Date:   Mon Sep 12 10:32:57 2022 -0400

    Spelling typo

commit c15f01c75083403f17c58c121b2afe97e9f2a1ec
Author: Kendell Clement <[email protected]>
Date:   Tue Sep 6 17:49:52 2022 -0400

    Add helper function to create alignment scoring matrix

    New scoring matrix can be created using CRISPResso2Align.make_matrix()

commit c80f82838c5a228b79ad4484092877cfee08e02c
Author: Cole Lyman <[email protected]>
Date:   Mon Aug 22 18:28:33 2022 -0600

    Add `zip_output` (#240)

    * Making zip of results

    * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping

    * Adding --zip to compare and pooled/wgs compare

    * Add more formatting changes to CRISPRessoShared

    * Refactoring propagate_crispress_options so only one version exists

    * Zip added to arguments_to_ignore and warning added when changing arguments

    * Restore styling

    * Update README to include --zip

    * Rename --zip to --zip_output

    * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE

    * Bug fix arg to args

    Co-authored-by: Samuel Nichols <[email protected]>

commit 5de3d7286d8e33c7cf4d3615fce715806e72f511
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 11 21:42:34 2022 -0400

    Fix fix to aggregate for CRISPRessoWGS

commit a2294c266f43b14969a5d6474076f31a77a57173
Author: Kendell Clement <[email protected]>
Date:   Thu Aug 11 21:40:50 2022 -0400

    Fix bug in aggregate for WGS

commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3
Author: Kendell Clement <[email protected]>
Date:   Mon Aug 8 21:53:45 2022 -0400

    Update CRISPRessoWGS to allow non-word characters in region names

commit 040ac0033d6e250f4e3a412101874cf5e914e08a
Author: kclem <[email protected]>
Date:   Mon Aug 8 16:04:59 2022 -0400

    Enable processing of cram files by CRISPRessoWGS

    Adds --reference to samtools view when viewing cram files

commit cf112a0caba8789e28530cc09171285ec6ea9b4c
Author: kclem <[email protected]>
Date:   Mon Aug 8 14:55:46 2022 -0400

    Auto amplicon detection for interleaved input

    Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed.

commit 4ba524dc7b947feca8a0f743837844f9febc2171
Author: Cole Lyman <[email protected]>
Date:   Thu Aug 4 11:32:11 2022 -0600

    Potential fix for aggregate plots in Batch mode (#237)

commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 21 22:45:48 2022 -0400

    Fix pct_vectors in crispresso2_info json object

commit 65a079d86d6f386793397398f839c46014b54543
Author: Kendell Clement <[email protected]>
Date:   Wed Jul 20 23:46:37 2022 -0400

    Fix more readme spelling bugs

commit e817376ecd54cdea1f29e303ca25b9e7d1d38333
Author: Kendell Clement <[email protected]>
Date:   Wed Jul 20 23:42:23 2022 -0400

    Fix bug in readme spelling

commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d
Author: Kendell Clement <[email protected]>
Date:   Wed Jul 20 16:10:09 2022 -0400

    Fix loading of crispresso info from WGS and Pooled

commit b68a43271115251b18e8955e285ccc18f549e8cd
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 14 14:11:04 2022 -0400

    Add plotly to dockerfile

commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 14 14:10:00 2022 -0400

    Fix #231 Allow N's in bam output (Try 2)

commit c460b3e73fd06a230dbac2e37c86b833144ebf94
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 14 14:09:10 2022 -0400

    Revert "Fix #231 Allow N's in bam output"

    This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3.

commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3
Author: Kendell Clement <[email protected]>
Date:   Thu Jul 14 13:52:37 2022 -0400

    Fix #231 Allow N's in bam output

commit 0a2419e518dc9b3520058c3927f98b31cd51347e
Author: Cole Lyman <[email protected]>
Date:   Fri Jul 8 21:10:01 2022 -0600

    Fix bug when name is provided instead of amplicon_name in pooled input file (#229)

    Also, raise an exception (instead of incorrectly executing) when there are not
    enough matched parameters in the pooled input file.

commit cb58212379803788c04ca5793baaa760cbbeaa81
Author: Cole Lyman <[email protected]>
Date:   Fri Jul 8 21:09:49 2022 -0600

    Fix bug when comparing two samples with the same name. (#228)

commit e8a796f5f451409cbafed4404dfba4b6b8a124ca
Author: Kendell Clement <[email protected]>
Date:   Thu Jun 23 21:30:23 2022 -0400

    Version bump to 2.2.9

commit 632143ddedea48bab9229baeb4bf3ea4d1f658d6
Author: Cole Lyman <[email protected]>
Date:   Mon Jun 20 19:53:14 2022 -0600

    Don't run global frameshift plot when there are no reads (#226)

    When there are no reads (i.e. global_MODIFIED_FRAMESHIFT +
    global_MODIFIED_NON_FRAMESHIFT + global_NON_MODIFIED_NON_FRAMESHIFT == 0) there
    was a bug when trying to compute the pie chart, because all of the values in the
    pie chart are 0. This fix, will make sure that there is at least one read in
    order for the plot to bee constructed properly.

commit 4bb06218e835d2624d53fd401542caef6f8a3a55
Author: kclem <[email protected]>
Date:   Fri Jun 3 16:57:02 2022 -0400

    Improvements for guide inference in 'auto' mode

    In 'auto' mode, a putative guide sequence is selected at the site of maximal editing.  If the site of maximal editing happens near the end of the guide (e.g. base 0) many things will break (e.g. quantification windows, etc). This update excludes bases from being used to find the guide using the --exclude_bp_from_left and --exclude_bp_from_right parameters. At default, these parameters are 15bp, so the first and last 15bp would not be selected for the site of maximal editing and thus be the site of a guide sequence. In addition, the site of maximal editing must have 3x the magnitude over the background.

commit 9d64de187835b2553ad2b4374d32edab27f83645
Author: Kendell Clement <[email protected]>
Date:   Thu Jun 2 20:22:25 2022 -0400

    Update README.md

commit 6aafc5387986f5089ba55b68d128343d68052792
Author: Simon P Shen <[email protected]>
Date:   Tue May 31 17:42:53 2022 -0400

    directory in quotes in batch cmd (#222)

    Add quotes around output folder for folders that have spaces.

commit 432f163ac68b9a650d1fd326171aadc505ee87f4
Author: Kendell Clement <[email protected]>
Date:   Tue May 24 23:38:36 2022 -0400

    CRISPRessoBatch fills NA values in batch settings

    NA values in CRISPRessoBatch are filled with the value from args - either the default value or the value from the command line args (if set)

commit 6de774adbad3aa8cd99d07b0ba7692984b356cd4
Author: kclem <[email protected]>
Date:   Mon May 23 14:18:02 2022 -0400

    Fix file naming bug for HDR outputs

    In html file, figures 4e and 4f incorrectly referenced figure 4d. This fixes this bug.

commit b88fec0668a4082a12ead3d26582e86d829dd7cc
Author: Kendell Clement <[email protected]>
Date:   Sat May 21 00:32:15 2022 -0400

    For bam_output, fix bug that wrote unaligned lines twice

commit 3564e77ebcdedb4b01cc01dcca18ba3221fac67c
Author: Kendell Clement <[email protected]>
Date:   Thu May 19 16:32:18 2022 -0400

    Update README with CRISPRessoPooled headers and bam_output parameters

commit bc08d81f17cb1929d1c37a1773cffcf36fb12fe2
Author: Kendell Clement <[email protected]>
Date:   Thu May 19 16:11:30 2022 -0400

    Add more links to tools

commit 006c497a379ecd94b017a883a5db887861e1586a
Author: Kendell Clement <[email protected]>
Date:   Thu May 19 16:08:14 2022 -0400

    Add links to tools

commit dc8243373ad00d6bd467fc30c59942596ff0c5d6
Author: Kendell Clement <[email protected]>
Date:   Mon May 16 21:38:06 2022 -0400

    fastq_to_bam implementation (#219)

commit e88b6833977c6b2768299e0b2e7af623e3a9ae7c
Author: Kendell Clement <[email protected]>
Date:   Sun May 8 02:14:13 2022 -0400

    Fix bug for when guides don't agree in CRISPRessoAggregate

commit 7eb763116a8c60603f1cd654645215767ee8eb52
Author: Kendell Clement <[email protected]>
Date:   Thu May 5 03:28:21 2022 -0400

    Fix bug for case of empty summary plots in report generation

commit 0324fa67d14ed945f0c9531d9bcf73ebcf4ca042
Author: Kendell Clement <[email protected]>
Date:   Thu May 5 03:28:02 2022 -0400

    Create report for number of significant bases in CRISPRessoCompare

commit e3c9d0026a9ee6732f3ed6bdcf2a824850d7e66a
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 22:43:11 2022 -0400

    Update pickle to json in readme and CRISPRessoPooledWGSCompare

commit 1553f7977c12bf1091a20ca55b878bccfb739b61
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 18:10:04 2022 -0400

    Merge pull request #4 from pinellolab/master (#218)

commit bcecbfc047d294e26f381a6668e08cb4db24445c
Merge: 15b0e05b bb13e007
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 18:06:37 2022 -0400

    Merge branch 'master' into master

commit bb13e007738d6e7a4909e01f03daff592f334f36
Merge: af4ab6e8 d0b41483
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 17:59:32 2022 -0400

    Merge branch 'master' of https://github.com/edilytics/CRISPResso2

commit 15b0e05b9e03bbec5236e58776ddf9aa2f93180e
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 17:54:52 2022 -0400

    2 flexible pooled input (#217)

    * Batch type coerce and r2 file check

    * Upgrade tabs for bootstrap5

    * Update readme with additional pooled amplicon file headers

    Co-authored-by: Samuel Nichols <[email protected]>

commit d0b41483bee704940ba60c58289f412b04c71659
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 13:43:43 2022 -0400

    Update README.md

commit ce49fab5301cb73ba0daf6c765e350eb083c76f1
Merge: 5f909713 b913fcb4
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 13:40:30 2022 -0400

    Merge pull request #3 from edilytics/2-flexible-pooled-input

    Add flexibility to CRISPRessoPooled amplicon input by allowing headers. Also, prime editing and quantification window coordinate parameters can be passed to CRISPRessoPooled.

commit b913fcb402a8ba3106c3ff7913563a33d8d19fca
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 13:38:25 2022 -0400

    Update CRISPRessoPooledCORE.py

    Replace process to read header, increase flexibility for column order

commit 945bf31f16530b7ce25b89095b2c7005bf146117
Merge: 7b8f6788 5f909713
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 12:45:24 2022 -0400

    Merge branch 'master' into 2-flexible-pooled-input

commit 5f9097133765736a7c2fe3c8e9b730845fed0b70
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 12:23:44 2022 -0400

    Version bump to 2.2.8

commit c4a94ce0e06c6ebae13e128fbe6b708e635121c4
Author: Kendell Clement <[email protected]>
Date:   Wed May 4 00:13:17 2022 -0400

    Fix summary plot representation for multi reports

    *fixed old reference to make_multi_report which called old summary plot format
    * renamed summary_plot to summary_plots to reflect a dict with multiple plots

commit 62900e9ae6fa37ce99a04f12a63ed5c912f75042
Author: Cole Lyman <[email protected]>
Date:   Tue May 3 20:47:52 2022 -0600

    Large aggregation (#192)

    * Squashed commit of the following:

    commit 8564eb03f0d9e62abf4b7528baf5c2ae296be8f9
    Merge: f6ef62c 07cc7d8
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 11 16:20:15 2022 -0500

        Merge branch 'indel-alignment-fix' of https://github.com/edilytics/CRISPResso2 into indel-alignment-fix

    commit 07cc7d856ab3fcbbaa5381f17f29568192388887
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:29:59 2021 -0700

        Fix bug in `find_indels_substitutions`

        This bug occurred when there was a deletion at the end of a sequence, and was
        thus not properly accounted for.

    commit f6ef62cfdf909adac1b10ea86555cd218f8b2a74
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:29:59 2021 -0700

        Fix bug in `find_indels_substitutions`

        This bug occurred when there was a deletion at the end of a sequence, and was
        thus not properly accounted for.

    commit 7212f87f4be60057a6c848947ff6b5efde132a25
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:26:17 2021 -0700

        Add a unit test for `find_indels_substitutions`

        This unit test checks for deletions at the end of a sequence, which are
        inherently outside of the include_indx_set window.

    commit d50b4e903b973c71a275e31d470b40e59280ee13
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:03:22 2021 -0700

        Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

    commit 4db066f7bc333b7662a9232ac732ebb33ac3ace8
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:01:39 2021 -0700

        Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

    commit 3b3a7417f5bbd6c2785a2af54a47e01d2e820451
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 11:37:07 2021 -0700

        Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

    commit e9f5eff3d95b676b5ee2e23371a5604f600d34b2
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:26:17 2021 -0700

        Add a unit test for `find_indels_substitutions`

        This unit test checks for deletions at the end of a sequence, which are
        inherently outside of the include_indx_set window.

    commit d4d45a918254ab19a7e7956e9e731389c6f36ecb
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:03:22 2021 -0700

        Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

    commit 13f00bb40239c83e6e5cf844561fdb7000d3d9ab
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:01:39 2021 -0700

        Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

    commit 659ae34e8fd106f7ecc163b5bea0b5a80ab0283c
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 11:37:07 2021 -0700

        Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

    * Add parameter `--suppress_batch_summary_plots`

    If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

    * Pep formatting cleanup

    * Add summary nucleotide plots to aggregate

    * Aggregate plots are paginated

    * Update CRISPRessoAggregateCORE.py

    Remove max sample limit for plotting

    * Add --max_samples_per_summary_plot to CRISPRessoAggregate

    Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

    * Add plotly function to plot an interactive heatmap

    * Fix deprecated numpy type to suppress warning

    * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

    These heatmaps are interactive (zoomable and panable) and show for each sample
    the percentage of insertions, substitutions, and deletions.

    * Add the heatmap summaries to the CRISPRessoAggregate report

    * Update Bootstrap to 5.1.3

    This is mainly so that we can use the fullscreen modal functionality in this version.

    * Move the plotly heatmaps to a Bootstrap modal

    * Fix bug where plots were not filling up entire modal.

    I have tried countless different ways for this to work, and this is the best
    that I can come up with. After the modal is opened it triggers the plot to
    resize, and then for some reason you need to trigger the resize event. I think
    this is because a `div` changing size won't actually trigger the resizing of the
    plot (and neither will just calling `Plotly.Plots.resize`...?!).

    * Update the axis labels and add autosize to plotly heatmaps

    I'm pretty sure the autosize doesn't do anything, but it is there for good
    measure.

    * Abandon attempts to make plots fullscreen

    This includes removing the Bootstrap modal (two out of the three plots would
    resize properly and I couldn't figure out a way to have the plot displayed
    outside of the modal). I have left in some javascript to make the plot
    fullscreen, but I couldn't get the formatting quite right and the plot wasn't
    much bigger in the fullscreen version because there was a ton of space between
    the plot and the heatmap. If some brave soul would like to tackle it, feel free!

    * Rename and refactor how plot data is passed around

    I have consolidated how the plot data is passed around, so that now you can pass
    in only one dict with all of the information instead of 4 or 5 separate
    parameters. I also renamed the `heatmap_plot_*` to
    `allele_modification_heatmap_*`.

    * Implement the line plot version of the modification percentages

    This also includes correctly resizing the plot when the line plot tab is
    selected!

    * Change default `max_samples_per_summary_plot` to be 150 instead of 250

    * Remove extra assignments of `this_number_samples` and suppress plot

    The plot that is suppressed is the large nucleotide quilt when there is a large
    number of samples. Is it okay to suppress this plot @kclem?

    * Implement parallel plotting in CRISPRessoAggregate

    * Fix sample indexing error and heatmap scaling for large number of samples

    * Add parameter `--suppress_batch_summary_plots`

    If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

    * Pep formatting cleanup

    * Add summary nucleotide plots to aggregate

    * Aggregate plots are paginated

    * Update CRISPRessoAggregateCORE.py

    Remove max sample limit for plotting

    * Add --max_samples_per_summary_plot to CRISPRessoAggregate

    Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

    * Add plotly function to plot an interactive heatmap

    * Fix deprecated numpy type to suppress warning

    * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

    These heatmaps are interactive (zoomable and panable) and show for each sample
    the percentage of insertions, substitutions, and deletions.

    * Add the heatmap summaries to the CRISPRessoAggregate report

    * Update Bootstrap to 5.1.3

    This is mainly so that we can use the fullscreen modal functionality in this version.

    * Move the plotly heatmaps to a Bootstrap modal

    * Fix bug where plots were not filling up entire modal.

    I have tried countless different ways for this to work, and this is the best
    that I can come up with. After the modal is opened it triggers the plot to
    resize, and then for some reason you need to trigger the resize event. I think
    this is because a `div` changing size won't actually trigger the resizing of the
    plot (and neither will just calling `Plotly.Plots.resize`...?!).

    * Update the axis labels and add autosize to plotly heatmaps

    I'm pretty sure the autosize doesn't do anything, but it is there for good
    measure.

    * Abandon attempts to make plots fullscreen

    This includes removing the Bootstrap modal (two out of the three plots would
    resize properly and I couldn't figure out a way to have the plot displayed
    outside of the modal). I have left in some javascript to make the plot
    fullscreen, but I couldn't get the formatting quite right and the plot wasn't
    much bigger in the fullscreen version because there was a ton of space between
    the plot and the heatmap. If some brave soul would like to tackle it, feel free!

    * Rename and refactor how plot data is passed around

    I have consolidated how the plot data is passed around, so that now you can pass
    in only one dict with all of the information instead of 4 or 5 separate
    parameters. I also renamed the `heatmap_plot_*` to
    `allele_modification_heatmap_*`.

    * Implement the line plot version of the modification percentages

    This also includes correctly resizing the plot when the line plot tab is
    selected!

    * Change default `max_samples_per_summary_plot` to be 150 instead of 250

    * Remove extra assignments of `this_number_samples` and suppress plot

    The plot that is suppressed is the large nucleotide quilt when there is a large
    number of samples. Is it okay to suppress this plot @kclem?

    * Implement parallel plotting in CRISPRessoAggregate

    * Fix sample indexing error and heatmap scaling for large number of samples

    * Add plotly requrement to setup.py

    * Remove space around vertical barcharts

    * Add scrollbar to long images in multiReport

    * Fill in default (empty) values to allele modification plots

    When not running CRISPRessoAggregate, default values for the
    `allele_modification_heatmap_plot` and `allele_modification_lin_plot`
    dictionaries will be set so that the template can be properly rendered.

    * Include CRISPRessoBatch in the refactor of how summary_plot dicts are handled

    * Update dockerfile for new docker

    * minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

    * Allow for flexible parsing of quant window coordinates

    * CRISPRessoPooled debug flash command, fix pep formatting

    * Set flexiguide homology parameter type to int

    * Coerce ints in batch file checking (#200)

    * Batch type coerce and r2 file check

    * Revert "Batch type coerce and r2 file check"

    This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

    * Coerce int values

    * Handle multiple qwcs in batch mode

    If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

    * Fix bug from old pandas for int cols

    Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

    * Create allele modification heatmaps and line plots in CRISPRessoBatch

    * Add allele modification heatmaps and line plots to CRISPRessoBatch

    * Make all plots in CRISPRessoBatch run in parallel

    * Make `--suppress_batch_summary_plots` store true

    Also, only open and shutdown the process pool when necessary.

    * Add blank values for allele_modification entries when not present

    Co-authored-by: Kendell Clement <[email protected]>
    Co-authored-by: dharjanto <[email protected]>
    Co-authored-by: Samuel Nichols <[email protected]>

commit f67376fc9ab0e407d4086aa42fd1c77706ebc9c0
Author: Kendell Clement <[email protected]>
Date:   Fri Apr 15 00:46:30 2022 -0400

    Fix bug from old pandas for int cols

    Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

commit b34fe2956ff88629809b2434878028723dfc4895
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 14 23:58:07 2022 -0400

    Handle multiple qwcs in batch mode

    If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

commit c94e3b9f2e301bda91e9c1e6f4ef794b33b5dbf0
Author: Samuel Nichols <[email protected]>
Date:   Thu Apr 14 21:48:32 2022 -0600

    Coerce ints in batch file checking (#200)

    * Batch type coerce and r2 file check

    * Revert "Batch type coerce and r2 file check"

    This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

    * Coerce int values

commit fc4542491bb86eb143db0044a848a56234403496
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 14 22:13:23 2022 -0400

    Set flexiguide homology parameter type to int

commit 23fe2aa8e26067d1bcf36bfafc67e023c7588d2f
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 14 22:12:37 2022 -0400

    CRISPRessoPooled debug flash command, fix pep formatting

commit d292d33d8c1fa3bfd2cee656643fd47bcdab161d
Author: Kendell Clement <[email protected]>
Date:   Thu Apr 14 22:00:19 2022 -0400

    Allow for flexible parsing of quant window coordinates

commit e1667cb53a7ea6fbb33369c8530a78639ed423ec
Author: dharjanto <[email protected]>
Date:   Mon Apr 11 22:08:21 2022 -0400

    minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

commit 7b8f6788da18f6ab173fa3c3d10f4ab6bb2acc26
Author: Samuel Nichols <[email protected]>
Date:   Fri Apr 8 10:21:00 2022 -0600

    Update README

commit 9bc24cd0474ed9f398dff64274d3181c4b2f8637
Author: Samuel Nichols <[email protected]>
Date:   Tue Mar 29 11:25:09 2022 -0600

    Using Amplicon_Name

commit 88ac5d72074b3da63de035e02c911ce34cd29414
Merge: b6057a2d e5afa478
Author: Samuel Nichols <[email protected]>
Date:   Mon Mar 28 22:32:09 2022 -0600

    Merge remote-tracking branch 'origin/master' into 2-flexible-pooled-input

commit b6057a2d54cb8637ff0900416de8e2de72213f76
Author: Samuel Nichols <[email protected]>
Date:   Mon Mar 28 20:53:05 2022 -0600

    Printing info statements for matched headers

commit af4ab6e8507d7aa4b7b68f217a458e0d9c966f55
Merge: bbb7d6f0 51a943c3
Author: Cole Lyman <[email protected]>
Date:   Fri Mar 25 09:44:13 2022 -0600

    Merge branch 'pinellolab:master' into master

commit 3c1eb012fc02563e3e963f17a62c7e932f5bcddc
Author: Samuel Nichols <[email protected]>
Date:   Thu Mar 24 12:31:43 2022 -0600

    Debugging and column checking

commit 0b47acbc592a6df6adf14641357b2104b76be691
Author: Samuel Nichols <[email protected]>
Date:   Wed Mar 23 09:42:51 2022 -0600

    New variables added to pooled

commit a0ff3a44d6d19d7b37f91919b5c0180206f72d53
Author: Samuel Nichols <[email protected]>
Date:   Mon Mar 21 09:32:28 2022 -0600

    Read as string not bytes

commit 710675fc3c0307e21103abd604315b47ff80a894
Author: Samuel Nichols <[email protected]>
Date:   Wed Mar 16 13:51:30 2022 -0600

    Adding command building for new options

commit f386818a48e5c840bd567611e6f1320c8146cac7
Author: Samuel Nichols <[email protected]>
Date:   Wed Mar 16 10:08:33 2022 -0600

    Comment out df_template.iloc instance

commit eb5e309da57c8b96cd760728ddbf67be05f30d1c
Author: Samuel Nichols <[email protected]>
Date:   Wed Mar 16 09:59:19 2022 -0600

    Potential solution for flexible headers

commit 51a943c3a8f8181963acc420e75a5e8ee103cf7c
Author: Kendell Clement <[email protected]>
Date:   Tue Mar 15 11:00:46 2022 -0400

    CRISPRessoPooled pep formatting and fix

    CRISPRessoPooled doesn't re-count reads if it has been run once and the `aligned_pooled_bam` is provided as input
    pep code formatting changes

commit bbb7d6f0907aa13518d20e7f470e7de518b825f4
Merge: ddbd39f0 5a10d638
Author: Kendell Clement <[email protected]>
Date:   Tue Mar 15 10:23:38 2022 -0400

    Merge branch 'master' of https://github.com/edilytics/CRISPResso2

commit 5a10d638c638f21f8a2934955e92ef7e117b889e
Author: Kendell Clement <[email protected]>
Date:   Sat Feb 26 14:21:57 2022 -0500

    Move metadata for bam input and output

commit e5afa4784d5330a1dc95c5deafcd9217edeac631
Author: Samuel Nichols <[email protected]>
Date:   Wed Feb 16 10:20:24 2022 -0700

    Coerce int values

commit ede7d85b50055311908000578c76a1860ae9de4d
Author: Samuel Nichols <[email protected]>
Date:   Wed Feb 16 10:18:29 2022 -0700

    Revert "Batch type coerce and r2 file check"

    This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

commit f91736688ea9739cf3063e3601c52ad6da1116a4
Author: Samuel Nichols <[email protected]>
Date:   Wed Feb 16 10:10:52 2022 -0700

    Batch type coerce and r2 file check

commit 7b4a310b0f8b64c00e02eca3d522ad50d39b43ae
Author: Kendell Clement <[email protected]>
Date:   Tue Feb 15 22:18:05 2022 -0500

    Reiterate WGS region file is tab-separated

    Add note to WGS description that region file should be tab-separated. Closes #199

commit b8497542e388ad401d0815d426f27abc3201a76d
Author: kclem <[email protected]>
Date:   Fri Feb 11 15:07:14 2022 -0500

    Extend x-axis to longest scaffold incorporation length

commit ab7248947afade089809c74bfe6e9d5394e8f6dc
Author: kclem <[email protected]>
Date:   Wed Feb 9 17:05:11 2022 -0500

    Fix prime editing indexing for plots

commit ddbd39f06b262d5ebd2cc69e116c08b22b6bd84e
Merge: a7ffd468 442a48c7
Author: Kendell Clement <[email protected]>
Date:   Thu Jan 13 15:35:36 2022 -0500

    Merge branch 'pinellolab:master' into master

commit 442a48c7f4c62ec2ebc95fe268475e5e2a4b2f0c
Author: Cole Lyman <[email protected]>
Date:   Tue Jan 11 15:28:28 2022 -0700

    Indel alignment fix (#182)

    * Fix bug in CRISPRessoCompare where sample names were not properly set

    This was a place where it was (partially) missed during the crispresso2_info
    object refactoring.

    * Add test case for `find_indels_substitutions`

    This test case is extracted from the CRISPRessoBatch integration test and
    provides an example where there is an insertion at the edge of the include
    index.

    * Fix a bug in `find_indels_substitutions`

    The bug that this commit fixes is when an insertion occurs at the edge of the
    include indexes. The trouble with this earlier was that it was using the `idx`
    to calculate the size of the insertion, but the `idx` wasn't being incremented
    anymore because it was outside of the include window.

    * Add a unit test for `find_indels_substitutions`

    This unit test checks for deletions at the end of a sequence, which are
    inherently outside of the include_indx_set window.

    * Fix bug in CRISPRessoCompare where sample names were not properly set

    This was a place where it was (partially) missed during the crispresso2_info
    object refactoring.

    * Add test case for `find_indels_substitutions`

    This test case is extracted from the CRISPRessoBatch integration test and
    provides an example where there is an insertion at the edge of the include
    index.

    * Fix a bug in `find_indels_substitutions`

    The bug that this commit fixes is when an insertion occurs at the edge of the
    include indexes. The trouble with this earlier was that it was using the `idx`
    to calculate the size of the insertion, but the `idx` wasn't being incremented
    anymore because it was outside of the include window.

    * Add a unit test for `find_indels_substitutions`

    This unit test checks for deletions at the end of a sequence, which are
    inherently outside of the include_indx_set window.

    * Fix bug in `find_indels_substitutions`

    This bug occurred when there was a deletion at the end of a sequence, and was
    thus not properly accounted for.

    * Fix bug in `find_indels_substitutions`

    This bug occurred when there was a deletion at the end of a sequence, and was
    thus not properly accounted for.

    * Squashed commit of the following:

    commit 8564eb03f0d9e62abf4b7528baf5c2ae296be8f9
    Merge: f6ef62c 07cc7d8
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 11 16:20:15 2022 -0500

        Merge branch 'indel-alignment-fix' of https://github.com/edilytics/CRISPResso2 into indel-alignment-fix

    commit 07cc7d856ab3fcbbaa5381f17f29568192388887
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:29:59 2021 -0700

        Fix bug in `find_indels_substitutions`

        This bug occurred when there was a deletion at the end of a sequence, and was
        thus not properly accounted for.

    commit f6ef62cfdf909adac1b10ea86555cd218f8b2a74
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:29:59 2021 -0700

        Fix bug in `find_indels_substitutions`

        This bug occurred when there was a deletion at the end of a sequence, and was
        thus not properly accounted for.

    commit 7212f87f4be60057a6c848947ff6b5efde132a25
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:26:17 2021 -0700

        Add a unit test for `find_indels_substitutions`

        This unit test checks for deletions at the end of a sequence, which are
        inherently outside of the include_indx_set window.

    commit d50b4e903b973c71a275e31d470b40e59280ee13
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:03:22 2021 -0700

        Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

    commit 4db066f7bc333b7662a9232ac732ebb33ac3ace8
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:01:39 2021 -0700

        Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

    commit 3b3a7417f5bbd6c2785a2af54a47e01d2e820451
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 11:37:07 2021 -0700

        Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

    commit e9f5eff3d95b676b5ee2e23371a5604f600d34b2
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:26:17 2021 -0700

        Add a unit test for `find_indels_substitutions`

        This unit test checks for deletions at the end of a sequence, which are
        inherently outside of the include_indx_set window.

    commit d4d45a918254ab19a7e7956e9e731389c6f36ecb
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:03:22 2021 -0700

        Fix a bug in `find_indels_substitutions`

        The bug that this commit fixes is when an insertion occurs at the edge of the
        include indexes. The trouble with this earlier was that it was using the `idx`
        to calculate the size of the insertion, but the `idx` wasn't being incremented
        anymore because it was outside of the include window.

    commit 13f00bb40239c83e6e5cf844561fdb7000d3d9ab
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 15:01:39 2021 -0700

        Add test case for `find_indels_substitutions`

        This test case is extracted from the CRISPRessoBatch integration test and
        provides an example where there is an insertion at the edge of the include
        index.

    commit 659ae34e8fd106f7ecc163b5bea0b5a80ab0283c
    Author: Cole Lyman <[email protected]>
    Date:   Fri Dec 10 11:37:07 2021 -0700

        Fix bug in CRISPRessoCompare where sample names were not properly set

        This was a place where it was (partially) missed during the crispresso2_info
        object refactoring.

    * Fix bug in `find_indels_substitutions`

    Th…
Snicker7 added a commit to edilytics/CRISPResso2 that referenced this pull request Apr 4, 2024
commit 6f4b0ad885e1d72413a034bf7abaaa0360a3b0c4
Author: Samuel Nichols <[email protected]>
Date:   Thu Apr 4 15:18:09 2024 -0600

    Batch d3 clean (#55)

    * imports C2Pro plots if available

    * added --use_matplotlib flag

    * added C2Pro
    matched api funciton signatures

    * added api args for plotly

    * added **kwargs

    * renamed config to custom_config, more specificity

    * added backend flag for plotly kaleido

    * added pro_installed boolean for templates, added plotly dependency to report templates

    * Squashed commit of the following:

    commit c909ea3b34e87ce637e00dac075d2bb2f8bfb954
    Author: McKay <[email protected]>
    Date:   Thu Feb 15 15:55:23 2024 -0700

        added plotly dependency for pro

    commit 76b3601f6a0144f100266153f1c999e0c5de65de
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 09:56:19 2024 -0700

        Squashed commit of the following:

        commit 603f2eff9d1aa21ae95f3e134da303b8018d3a33
        Author: Samuel Nichols <[email protected]>
        Date:   Fri Jan 12 09:48:20 2024 -0700

            fix guardrials partial

        commit 22fc03183a8070c30dfb74d5c23575ac19019855
        Author: Samuel Nichols <[email protected]>
        Date:   Fri Jan 12 08:54:01 2024 -0700

            Add guardrail partial

        commit e55f6b21972b578261bc5a864ce1d653d98f9e34
        Author: Samuel Nichols <[email protected]>
        Date:   Mon Jan 8 07:50:59 2024 -0700

            Functional guardrails, needs reports update

        commit 6e968e9699ed59a47d88191d03768e042d8b60a4
        Merge: 32b49685 e948ce10
        Author: Samuel Nichols <[email protected]>
        Date:   Mon Dec 18 13:34:36 2023 -0700

            Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history

        commit 32b49685da320501dad2b0ebbb57887b66220ba8
        Author: Samuel Nichols <[email protected]>
        Date:   Fri Dec 15 15:27:04 2023 -0700

            Include guardrail functions

        commit 4e309cf6f732565d635de3d4c5d074ada3027e2d
        Author: Cole Lyman <[email protected]>
        Date:   Mon Dec 18 10:51:55 2023 -0700

            Refactor to use CRISPRessoReports module

        commit e648dc087c0055bc5d2fca13c64071a371dea941
        Author: Cole Lyman <[email protected]>
        Date:   Mon Dec 18 10:51:11 2023 -0700

            Add CRISPRessoReports subtree

        commit e948ce107ebb0d1d99010ed12e937f34b5e607d4
        Author: Samuel Nichols <[email protected]>
        Date:   Fri Dec 15 15:27:04 2023 -0700

            Include guardrail functions

        commit d33c748871a625facfe8d792e29c77ab9779138f
        Author: Kendell Clement <[email protected]>
        Date:   Tue Nov 7 16:31:06 2023 -0700

            Include parameter --assign_ambiguous_alignments_to_first_reference in readme

        commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc
        Author: Kendell Clement <[email protected]>
        Date:   Wed Oct 11 17:17:30 2023 -0600

            Enable quantification by sgRNA (#348)

            This PR includes:
            - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs']
            - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately.

            I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command:

            ```

            CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table  -n hdr3 -p max -gn guide1,guide2
            ```

            ```
            python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3
            ```

            This produces:
            ```
            Processed 25000 alleles
            Reference: Reference (2391/23415 modified reads)
                    UNMODIFIED: 21024
                    MODIFIED guide1: 2359
                    MODIFIED guide2: 32
            Reference: HDR (856/1577 modified reads)
                    UNMODIFIED: 721
                    MODIFIED guide1: 854
                    MODIFIED guide1 + guide2: 1
                    MODIFIED guide2: 1
             ```

        commit 2e3da02fdbed2fa8ae02a277763d65a502459827
        Author: Cole Lyman <[email protected]>
        Date:   Tue Oct 10 15:29:08 2023 -0600

            changed tuple to list for matplotlib change (#31) (#346)

            Co-authored-by: mbowcut2 <[email protected]>

        commit cd3c332135fe4db0f9218e3d87263d5c65838ed9
        Author: Kendell Clement <[email protected]>
        Date:   Sun Oct 1 01:54:46 2023 -0600

            rename script to camel case

        commit 7c719d65fb36ac7654db9040f226564ea28fcab9
        Author: Kendell Clement <[email protected]>
        Date:   Sun Oct 1 01:53:44 2023 -0600

            Add new script for counting high quality bases

        commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f
        Author: Kendell Clement <[email protected]>
        Date:   Thu Sep 14 15:15:30 2023 -0600

            Prime editing alignment params (#336)

            Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty.

            CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences.

            The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs.

        commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917
        Author: Cole Lyman <[email protected]>
        Date:   Thu Sep 7 16:43:30 2023 -0600

            Fix samtools piping (#325)

            * Remove samtools pipe stderr to stdout

            Sometimes some of the libraries that samtools depends on don't have the correct
            version information, and as such samtools will report this to stderr when run.
            Because we pipe the output of samtools, we expect it to be valid SAM format, but
            when these library version messages are reported, it breaks CRISPRessoWGS.

            * Remove extra spacing at end of lines and add missing comma in WGS

            * Log stderr from samtools in CRISPRessoWGS

        commit 8feff4101f27406d9d88ace97d31a518276bff3f
        Author: Cole Lyman <[email protected]>
        Date:   Fri Sep 1 09:43:56 2023 -0600

            Replace link to CRISPResso schematic with raw URL in README (#329)

            * Replace link to CRISPResso schematic with raw URL

            * Add new lines to the beginning of unordered lists

        commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6
        Author: Kendell Clement <[email protected]>
        Date:   Thu Aug 10 00:52:12 2023 -0600

            Try to unbreak CircleCI

        commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827
        Author: Kendell Clement <[email protected]>
        Date:   Thu Aug 10 00:17:27 2023 -0600

            Center command line text messages

        commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b
        Author: Kendell Clement <[email protected]>
        Date:   Thu Aug 10 00:17:07 2023 -0600

            Fix bug in prime-editing scaffold-incorporation plotting

            If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read.

        commit 2b36a1a5c35e8a93516ce8baf464595615e0f402
        Author: Kendell Clement <[email protected]>
        Date:   Wed Aug 9 15:29:48 2023 -0600

            CRISPRessoPooled --compile_postrun_references bug fixes

        commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db
        Author: Kendell Clement <[email protected]>
        Date:   Tue Aug 8 23:30:15 2023 -0600

            Fix missing ' in Pooled --demultiplex_only_at_amplicons

        commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664
        Author: Cole Lyman <[email protected]>
        Date:   Mon Jul 24 10:47:46 2023 -0600

            Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316)

            * Make sorting stable

            * Including c files

            * Sort by #Reads instead of %Reads to avoid floating point errors

            ---------

            Co-authored-by: Samuel Nichols <[email protected]>

        commit de05533b3511a84f3b6b14fc2ef64db041613261
        Author: Cole Lyman <[email protected]>
        Date:   Thu Jul 6 13:54:45 2023 -0600

            Fix multiprocessing lambda pickling (#311)

            * Fix running plots in parallel

            The reason the plots were running slower before this change is because I was
            calling the plot function, not passing it to `submit`. So it was essentially
            running in serial, but worse because it was still spinning up/down the
            processes.

            * Fix multiprocessing lambda pickling (#20)

            * Refactor process_futures to be a dict

            This makes debugging much easier because you can associate the arguments to the
            future with the results.

            * Fix the pickling error when running in multiprocessing

            Only top-level functions (not lambdas) can be pickled to use in multiprocessing
            pools, thus the lambdas are converted to a regular function.

            * Further fixes to pickling multiprocessing error (#21)

            * Refactor process_futures to be a dict

            This makes debugging much easier because you can associate the arguments to the
            future with the results.

            * Fix the pickling error when running in multiprocessing

            Only top-level functions (not lambdas) can be pickled to use in multiprocessing
            pools, thus the lambdas are converted to a regular function.

            * Use Counter instead of defaultdict in CRISPRessoCORE

            * Update process_futures to dict in Batch and Aggregate

        commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74
        Author: Kendell Clement <[email protected]>
        Date:   Mon Jul 3 17:12:09 2023 -0600

            Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append

        commit 7285da0e987b77b72c8885bb35940e0f50c146bd
        Author: Kendell Clement <[email protected]>
        Date:   Fri Jun 23 16:50:33 2023 -0600

            Fix print bug for invalid fastq

        commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c
        Author: kclem <[email protected]>
        Date:   Wed Jun 21 16:03:48 2023 -0600

            Slugify before creating filename - replaces invalid characters in batch names with _

        commit f97e29c67de4c80b8d6b9cf334f363be4b514ade
        Author: Cole Lyman <[email protected]>
        Date:   Wed Jun 21 14:43:43 2023 -0600

            Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307)

            * Add verbosity argument to CRISPRessoAggregate (#18)

            * Allow for amplicon and guide seqs to be some variant of NA in batch (#19)

            This was discovered when attempting to infer amplicon sequences in batch mode on
            the web interface, NAs were supplied for the amplicon sequences to the sub
            CRISPResso commands.

        commit 32e1e9797da5c3033cdc588e92f06b8813961953
        Author: Mark Clement <[email protected]>
        Date:   Wed Jun 21 14:01:00 2023 -0600

            Allow for interrogation of overlapping sgRNA sites

        commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93
        Author: Kendell Clement <[email protected]>
        Date:   Mon Jun 12 12:16:47 2023 -0600

            Check input fastq file format

            Asserts input format of fastq files - including if gzipped files are missing the gz suffix.

        commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251
        Author: Kendell Clement <[email protected]>
        Date:   Mon Jun 5 13:41:55 2023 -0600

            Fix CRISPRessoArgParser

        commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be
        Author: Kendell Clement <[email protected]>
        Date:   Mon Jun 5 13:29:31 2023 -0600

            Cosmetic updates for command-line use

            - version bump to 2.2.13
            - If no args are provided, the command line version will print out an abbreviated help message
            - parameters can be excluded from CRISPRessoArgParser

        commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252
        Author: Cole Lyman <[email protected]>
        Date:   Thu May 11 14:31:47 2023 -0600

            Fix multiprocessing error, don't start pool when only using single thread (#302)

            * Update README to have consistent use of `--base_editor_output` (#16)

            * Add files via upload

            * Only start process pools when using multiple processes

            This is mainly to solve the issue when running on AWS Lambda, but this should
            improve single core performance overall.

            ---------

            Co-authored-by: Kendell Clement <[email protected]>

        commit 92a705c939b370373a70cf6ae9f1616de33288b9
        Author: Cole Lyman <[email protected]>
        Date:   Thu May 11 14:31:06 2023 -0600

            Update `base_editor` parameters in README and add Plot Harness (#301)

            * Update README to have consistent use of `--base_editor_output` (#16)

            * Add files via upload

            ---------

            Co-authored-by: Kendell Clement <[email protected]>

        commit 7d46c4490235df45c5546b1b470e4e6a99727031
        Author: Cole Lyman <[email protected]>
        Date:   Wed May 10 15:41:33 2023 -0600

            Clarify CRISPRessoWGS intended use (#303)

            * Update README to have consistent use of `--base_editor_output` (#16)

            * Add sample plotting jupyter notebook

            * Add clarifying info to CRISPRessoWGS description

            Clarify WGS usage

        commit 833a701787bb47674b3e921c38cac6189c775cf7
        Author: Kendell Clement <[email protected]>
        Date:   Thu May 4 17:02:46 2023 -0400

            Remove debug print statements

        commit 712eb2a11825e8d36f2870deb12b35486bd633fb
        Author: Kendell Clement <[email protected]>
        Date:   Thu May 4 16:40:07 2023 -0400

            Allow dashes in filenames resolve #73

        commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5
        Author: Kendell Clement <[email protected]>
        Date:   Sat Apr 22 23:41:58 2023 -0400

            Raise exceptions from within futures in plot_pool

        commit 7e807a60de2a9d18bccd034b87106ceaf7153338
        Author: Kendell Clement <[email protected]>
        Date:   Sat Apr 22 23:38:56 2023 -0400

            Fix future pandas indexing warning

            Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead"

        commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9
        Author: Cole Lyman <[email protected]>
        Date:   Thu Apr 20 13:59:27 2023 -0600

            Remove debug print statements fixes #295 (#297)

            The format string option used here is only available in Python version >=3.8.

        commit 478c06f784603e96d20f96e91993fdcc4ac35c8a
        Author: Kendell Clement <[email protected]>
        Date:   Thu Apr 13 12:09:26 2023 -0400

            Update plotCustomAllelePlot.py script for #292 (#293)

            Update type of 'max_rows' param to int
            Fix location of 'args' in crispresso2_info object

        commit bcdae39e05d530f4a4e78738c3b30f7664981919
        Author: Kendell Clement <[email protected]>
        Date:   Mon Mar 27 13:18:34 2023 -0400

            Update pooled parameter format

        commit 546446e36e7e68b527767d6c31ec341a49df2059
        Author: Kendell Clement <[email protected]>
        Date:   Tue Feb 14 16:26:23 2023 -0500

            Fix running plots in parallel (#286)

            The reason the plots were running slower before this change is because I was
            calling the plot function, not passing it to `submit`. So it was essentially
            running in serial, but worse because it was still spinning up/down the
            processes.

            Co-authored-by: Cole Lyman <[email protected]>

        commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1
        Author: kclem <[email protected]>
        Date:   Fri Feb 10 15:45:15 2023 -0500

            Fix #283 to avoid filename collisions

            Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29

        commit e577318006cd17b2725bd028e5e56634c6eb829a
        Author: kclem <[email protected]>
        Date:   Mon Feb 6 16:37:25 2023 -0500

            Case-insensitive headers accepted in CRISPRessoPooled

        commit d34927620a4a6126a9988b3041e76f60728abbfe
        Author: Kendell Clement <[email protected]>
        Date:   Tue Jan 31 13:48:33 2023 -0500

            Fix print statement in CORE

        commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5
        Author: Kendell Clement <[email protected]>
        Date:   Tue Jan 31 13:22:51 2023 -0500

            Version bump to 2.2.12

        commit 1d4679c72d0c8b4154317c9aff5179217198e2d7
        Author: Kendell Clement <[email protected]>
        Date:   Tue Jan 31 13:01:31 2023 -0500

            Status Updates + Pooled Mixed Mode Update (#279)

            * Implement logging handler to overwrite the latest log status to file

            * Add StatusHandler to CRISPRessoCORE log

            This will take the latest log output and write it to a file (`status.txt`), the
            catch being that with each log the file is overwritten so that one can easily
            tell where CRISPResso currently is and what the error is (if any). These changes
            include some slight refactoring in order to accomodate any potential parameter
            exceptions.

            * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn`

            * Add StatusHandler to CRISPRessoPooled and a little refactoring

            * Implement `percent_complete` to the status log

            * Add StatusHandler to CRISPRessoAggregate log

            * Add StatusHandler to CRISPRessoCompare log

            * Add StatusHandler to CRISPRessoPooledWGSCompare log

            * Add StatusHandler to CRISPRessoWGS log

            * Rename `status.txt` to `CRISPResso_status.txt`

            * Modify status log names to match the tool they are generated from

            * Add percent_complete stages to CRISPRessoCORE

            These also include log statements of each plot that is being generated as well
            as fixing some variable name collisions with `ind`.

            * Format the percentage in the log to be 2 decimal places

            * Change all plotting logs from `info` to `debug` and simplify progress

            This refactors how the progress of the plots is calculated, making it much
            simplier. Before this change we would of had to keep track of the number of
            times `percent_complete` was output, but now it simply updates the percent
            complete after each amplicon is finished processing. Hopefully this will make
            things easier to mantain even though it will be a little less "accurate" (not
            sure how accurate the original implementation was...).

            * Implemented shared console log handler across all CRISPResso* calls

            This allows for easy changes to logging formatting, which was inspired by having
            to change the default logging level. The default logging level needs to be set
            at `logging.DEBUG` in order for the debug log statements to not be ignored for
            the running and status logs.

            * Add ability to set the verbosity level to each CRISPResso* tool

            This allows users to set a verbosity level between 1 and 4 using the
            `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the
            level will default to 4, being the most verbose.

            * Implement showing the last seen `percent_compelte` when none is provided

            * Keep track of and log when multiple parallel runs are completed

            These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that
            we can now display when a run is completed. This potentially breaks how
            signals and interupts are handled with multiple runs happening, but this needs
            to be reviewed.

            * Add debug and percentage complete to CRISPRessoBatch

            * Add percent complete to CRISPRessoPooled

            * Add debug and percent_complete message to CRISPRessoAggregate

            * Add `percent_complete` to CRISPRessoCompare

            * Add `percent_complete` to CRISPRessoPooledWGSCompare

            * Add status and `percent_complete` to CRISPRessoMeta

            * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare

            * Fixing documentation to match pooled headers

            * Header removal bug fix change documentation to guide_seq

            * Update documentation and help feature for CRISPRessoPooled

            * Remove extra newlines from CRISPRessoPooled -h

            * Make variable names as clear as my firstborn child's name

            * Update one more variable name

            * Fix bug to flow CRISPRessoPooled options to sub command

            * Make amplicon file args variable name clear

            * Update how parameters are set and retrieved from parameter object

            The refactor in the previous commit changed the type of the arguments to a
            dictionary which doesn't have the parameters as attributes, and this commit
            fixes that error.

            * Add note in output header for change in default CRISPRessoPooled

            In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the
            default when running in mixed-mode. This is to allow for inexact alignments of
            the reads and the amplicons to the genome. For more context, see this issue
            https://github.com/pinellolab/CRISPResso2/issues/276

            * Clarify the verbosity parameter help message

            * Separate out parameters to `normalize_name` in CRISPRessoCORE

            * Separate out parameters to `normalize_name` in CRISPRessoWGS

            * Separate out parameters to `normalize_name` in CRISPRessoPooled

            * Separate out parameters to `normalize_name` in CRISPRessoCompare

            * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name`

            * Refactor `run_crispresso_cmds` to not require a `logger`

            This commit implements the functionality to make the `logger` object optional by
            seeing which module called the `run_crispresso_cmds` function and obtaining the
            correct object from that module name.

            The function also immediately returns when no commands are passed to it.

            * Add amplicon name to plotting debug statements in CRISPRessoCORE

            ---------

            Co-authored-by: Cole Lyman <[email protected]>
            Co-authored-by: Cole Lyman <[email protected]>
            Co-authored-by: Cole Lyman <[email protected]>
            Co-authored-by: Samuel Nichols <[email protected]>

        commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9
        Author: Kendell Clement <[email protected]>
        Date:   Thu Jan 26 15:27:27 2023 -0500

            CRISPRessoPooled custom header fix (#278)

            * Fixing documentation to match pooled headers

            * Header removal bug fix change documentation to guide_seq

            * Update documentation and help feature for CRISPRessoPooled

            * Remove extra newlines from CRISPRessoPooled -h

            * Make variable names as clear as my firstborn child's name

            * Update one more variable name

            Co-authored-by: Samuel Nichols <[email protected]>

        commit 104866e1080c973bb025d1a5ba59b19dca1658af
        Author: Cole Lyman <[email protected]>
        Date:   Thu Jan 5 14:00:26 2023 -0700

            Fix deprecated numpy type names (fixes #269) (#270)

            In the most recent version of numpy (1.24) some of the types have been
            deprecated. This commit fixes these errors.

        commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4
        Author: Cole Lyman <[email protected]>
        Date:   Thu Jan 5 06:49:35 2023 -0700

            Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274)

            I have suffered enough trying to debug my installation, so hopefully this helps
            someone else.

            Co-authored-by: Cole Lyman <[email protected]>

        commit b9851e98104602eb78c2b384105267624295e9d3
        Author: Cole Lyman <[email protected]>
        Date:   Thu Dec 22 13:30:23 2022 -0700

            Fix bug when pooled bam is input (#265)

            This change checks to see if a bam file was input, and if so it doesn't try to
            remove any intermediate files because there aren't any.

            Co-authored-by: Cole Lyman <[email protected]>

        commit b822612642043e75a19042941f69b457ce51f517
        Author: Kendell Clement <[email protected]>
        Date:   Mon Dec 19 15:26:45 2022 -0500

            Delete vscode settings

        commit b99aa624dec68ef7d19264340ce0cafa829625f4
        Author: Kendell Clement <[email protected]>
        Date:   Mon Dec 19 13:29:14 2022 -0500

            Clarify input param help for pooled bam

        commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04
        Author: Kendell Clement <[email protected]>
        Date:   Mon Dec 19 13:28:54 2022 -0500

            Fix #235 - Cigar string is * if read unaligned

            Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235.

        commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b
        Author: Cole Lyman <[email protected]>
        Date:   Thu Dec 8 13:48:17 2022 -0700

            Add deprecation notice (#260)

            * Add FLASh and Trimmomatic deprecation notice to CLI output

            * Add Edilytics email address to CLI output

        commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a
        Author: Kendell Clement <[email protected]>
        Date:   Tue Dec 6 12:16:19 2022 -0500

            Format filterReadsOnSequencePresence script

        commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e
        Author: Kendell Clement <[email protected]>
        Date:   Fri Dec 2 22:12:54 2022 -0500

            Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string

        commit 9ddea40f7f02b546941ddaa4c71fc5283075051a
        Author: kclem <[email protected]>
        Date:   Mon Nov 14 10:33:04 2022 -0500

            Add check for prime editing extension sequence in prime edited sequence

            if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence.

        commit 152f2dd5001da7090641ee8a1326bde9f7e8104e
        Author: kclem <[email protected]>
        Date:   Wed Nov 9 11:53:41 2022 -0500

            Version bump to 2.2.11a

        commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d
        Author: kclem <[email protected]>
        Date:   Wed Nov 9 11:47:30 2022 -0500

            Add param to override prime editing sequence checks

            CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.'

        commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b
        Author: kclem <[email protected]>
        Date:   Wed Nov 9 10:06:51 2022 -0500

            Update filterReadsOnSequencePresence.py

        commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb
        Author: Kendell Clement <[email protected]>
        Date:   Mon Nov 7 22:25:16 2022 -0500

            Add script to filter input based on sequence presence

        commit 713e57a19c35180035ca35e11a5820065eda0198
        Author: Kendell Clement <[email protected]>
        Date:   Tue Oct 18 16:02:26 2022 -0400

            Allow spaces in read names for CRISPRessoWGS

        commit 39ce008bdddccdd8229c0ba185dce78bc2f66968
        Author: Cole Lyman <[email protected]>
        Date:   Sat Oct 8 21:09:58 2022 -0600

            Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250)

        commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5
        Author: Kendell Clement <[email protected]>
        Date:   Sat Oct 8 23:08:47 2022 -0400

            Batch amplicon plots (#251)

            * Error out if HDR amplicon matches existing amplicon

            * Add check for amplicon sequence uniqueness

            * Fix bug with bam_input not having bam_output

            * Test for no returned lines in auto mode, version bump to 2.2.11

            * Fix pandas deprecation of df.append

        commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8
        Author: Kendell Clement <[email protected]>
        Date:   Thu Oct 6 16:32:02 2022 -0400

            Fix CRISPRessoBatch plot pool bug when plots are suppressed

        commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2
        Author: Cole Lyman <[email protected]>
        Date:   Wed Sep 21 21:04:51 2022 -0600

            Fix batch quilt plot name (#249)

            This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch.

        commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e
        Author: Kendell Clement <[email protected]>
        Date:   Thu Sep 15 15:49:08 2022 -0400

            Version bump to 2.2.10

        commit c5f79aebfc1ae209f4ee320df250eed89a02787c
        Author: Cole Lyman <[email protected]>
        Date:   Wed Sep 14 14:24:55 2022 -0600

            Parallel plot refactor (#247)

            * Fix duplicate plotting in CRISPRessoBatch aggregate

            * Refactor mulltiprocessing plots in CRISPRessoBatch

            * Refactor multiprocessing plots in CRISPRessoCORE

            * Refactor multiprocessing plots for CRISPRessoAggregate

        commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2
        Author: Kendell Clement <[email protected]>
        Date:   Tue Sep 13 14:12:11 2022 -0400

            print files in curr dir if Aggregate can't find files

        commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0
        Author: Kendell Clement <[email protected]>
        Date:   Mon Sep 12 10:32:57 2022 -0400

            Spelling typo

        commit c15f01c75083403f17c58c121b2afe97e9f2a1ec
        Author: Kendell Clement <[email protected]>
        Date:   Tue Sep 6 17:49:52 2022 -0400

            Add helper function to create alignment scoring matrix

            New scoring matrix can be created using CRISPResso2Align.make_matrix()

        commit c80f82838c5a228b79ad4484092877cfee08e02c
        Author: Cole Lyman <[email protected]>
        Date:   Mon Aug 22 18:28:33 2022 -0600

            Add `zip_output` (#240)

            * Making zip of results

            * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping

            * Adding --zip to compare and pooled/wgs compare

            * Add more formatting changes to CRISPRessoShared

            * Refactoring propagate_crispress_options so only one version exists

            * Zip added to arguments_to_ignore and warning added when changing arguments

            * Restore styling

            * Update README to include --zip

            * Rename --zip to --zip_output

            * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE

            * Bug fix arg to args

            Co-authored-by: Samuel Nichols <[email protected]>

        commit 5de3d7286d8e33c7cf4d3615fce715806e72f511
        Author: Kendell Clement <[email protected]>
        Date:   Thu Aug 11 21:42:34 2022 -0400

            Fix fix to aggregate for CRISPRessoWGS

        commit a2294c266f43b14969a5d6474076f31a77a57173
        Author: Kendell Clement <[email protected]>
        Date:   Thu Aug 11 21:40:50 2022 -0400

            Fix bug in aggregate for WGS

        commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3
        Author: Kendell Clement <[email protected]>
        Date:   Mon Aug 8 21:53:45 2022 -0400

            Update CRISPRessoWGS to allow non-word characters in region names

        commit 040ac0033d6e250f4e3a412101874cf5e914e08a
        Author: kclem <[email protected]>
        Date:   Mon Aug 8 16:04:59 2022 -0400

            Enable processing of cram files by CRISPRessoWGS

            Adds --reference to samtools view when viewing cram files

        commit cf112a0caba8789e28530cc09171285ec6ea9b4c
        Author: kclem <[email protected]>
        Date:   Mon Aug 8 14:55:46 2022 -0400

            Auto amplicon detection for interleaved input

            Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed.

        commit 4ba524dc7b947feca8a0f743837844f9febc2171
        Author: Cole Lyman <[email protected]>
        Date:   Thu Aug 4 11:32:11 2022 -0600

            Potential fix for aggregate plots in Batch mode (#237)

        commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71
        Author: Kendell Clement <[email protected]>
        Date:   Thu Jul 21 22:45:48 2022 -0400

            Fix pct_vectors in crispresso2_info json object

        commit 65a079d86d6f386793397398f839c46014b54543
        Author: Kendell Clement <[email protected]>
        Date:   Wed Jul 20 23:46:37 2022 -0400

            Fix more readme spelling bugs

        commit e817376ecd54cdea1f29e303ca25b9e7d1d38333
        Author: Kendell Clement <[email protected]>
        Date:   Wed Jul 20 23:42:23 2022 -0400

            Fix bug in readme spelling

        commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d
        Author: Kendell Clement <[email protected]>
        Date:   Wed Jul 20 16:10:09 2022 -0400

            Fix loading of crispresso info from WGS and Pooled

        commit b68a43271115251b18e8955e285ccc18f549e8cd
        Author: Kendell Clement <[email protected]>
        Date:   Thu Jul 14 14:11:04 2022 -0400

            Add plotly to dockerfile

        commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d
        Author: Kendell Clement <[email protected]>
        Date:   Thu Jul 14 14:10:00 2022 -0400

            Fix #231 Allow N's in bam output (Try 2)

        commit c460b3e73fd06a230dbac2e37c86b833144ebf94
        Author: Kendell Clement <[email protected]>
        Date:   Thu Jul 14 14:09:10 2022 -0400

            Revert "Fix #231 Allow N's in bam output"

            This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3.

        commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3
        Author: Kendell Clement <[email protected]>
        Date:   Thu Jul 14 13:52:37 2022 -0400

            Fix #231 Allow N's in bam output

        commit 0a2419e518dc9b3520058c3927f98b31cd51347e
        Author: Cole Lyman <[email protected]>
        Date:   Fri Jul 8 21:10:01 2022 -0600

            Fix bug when name is provided instead of amplicon_name in pooled input file (#229)

            Also, raise an exception (instead of incorrectly executing) when there are not
            enough matched parameters in the pooled input file.

        commit cb58212379803788c04ca5793baaa760cbbeaa81
        Author: Cole Lyman <[email protected]>
        Date:   Fri Jul 8 21:09:49 2022 -0600

            Fix bug when comparing two samples with the same name. (#228)

        commit e8a796f5f451409cbafed4404dfba4b6b8a124ca
        Author: Kendell Clement <[email protected]>
        Date:   Thu Jun 23 21:30:23 2022 -0400

            Version bump to 2.2.9

        commit 632143ddedea48bab9229baeb4bf3ea4d1f658d6
        Author: Cole Lyman <[email protected]>
        Date:   Mon Jun 20 19:53:14 2022 -0600

            Don't run global frameshift plot when there are no reads (#226)

            When there are no reads (i.e. global_MODIFIED_FRAMESHIFT +
            global_MODIFIED_NON_FRAMESHIFT + global_NON_MODIFIED_NON_FRAMESHIFT == 0) there
            was a bug when trying to compute the pie chart, because all of the values in the
            pie chart are 0. This fix, will make sure that there is at least one read in
            order for the plot to bee constructed properly.

        commit 4bb06218e835d2624d53fd401542caef6f8a3a55
        Author: kclem <[email protected]>
        Date:   Fri Jun 3 16:57:02 2022 -0400

            Improvements for guide inference in 'auto' mode

            In 'auto' mode, a putative guide sequence is selected at the site of maximal editing.  If the site of maximal editing happens near the end of the guide (e.g. base 0) many things will break (e.g. quantification windows, etc). This update excludes bases from being used to find the guide using the --exclude_bp_from_left and --exclude_bp_from_right parameters. At default, these parameters are 15bp, so the first and last 15bp would not be selected for the site of maximal editing and thus be the site of a guide sequence. In addition, the site of maximal editing must have 3x the magnitude over the background.

        commit 9d64de187835b2553ad2b4374d32edab27f83645
        Author: Kendell Clement <[email protected]>
        Date:   Thu Jun 2 20:22:25 2022 -0400

            Update README.md

        commit 6aafc5387986f5089ba55b68d128343d68052792
        Author: Simon P Shen <[email protected]>
        Date:   Tue May 31 17:42:53 2022 -0400

            directory in quotes in batch cmd (#222)

            Add quotes around output folder for folders that have spaces.

        commit 432f163ac68b9a650d1fd326171aadc505ee87f4
        Author: Kendell Clement <[email protected]>
        Date:   Tue May 24 23:38:36 2022 -0400

            CRISPRessoBatch fills NA values in batch settings

            NA values in CRISPRessoBatch are filled with the value from args - either the default value or the value from the command line args (if set)

        commit 6de774adbad3aa8cd99d07b0ba7692984b356cd4
        Author: kclem <[email protected]>
        Date:   Mon May 23 14:18:02 2022 -0400

            Fix file naming bug for HDR outputs

            In html file, figures 4e and 4f incorrectly referenced figure 4d. This fixes this bug.

        commit b88fec0668a4082a12ead3d26582e86d829dd7cc
        Author: Kendell Clement <[email protected]>
        Date:   Sat May 21 00:32:15 2022 -0400

            For bam_output, fix bug that wrote unaligned lines twice

        commit 3564e77ebcdedb4b01cc01dcca18ba3221fac67c
        Author: Kendell Clement <[email protected]>
        Date:   Thu May 19 16:32:18 2022 -0400

            Update README with CRISPRessoPooled headers and bam_output parameters

        commit bc08d81f17cb1929d1c37a1773cffcf36fb12fe2
        Author: Kendell Clement <[email protected]>
        Date:   Thu May 19 16:11:30 2022 -0400

            Add more links to tools

        commit 006c497a379ecd94b017a883a5db887861e1586a
        Author: Kendell Clement <[email protected]>
        Date:   Thu May 19 16:08:14 2022 -0400

            Add links to tools

        commit dc8243373ad00d6bd467fc30c59942596ff0c5d6
        Author: Kendell Clement <[email protected]>
        Date:   Mon May 16 21:38:06 2022 -0400

            fastq_to_bam implementation (#219)

        commit e88b6833977c6b2768299e0b2e7af623e3a9ae7c
        Author: Kendell Clement <[email protected]>
        Date:   Sun May 8 02:14:13 2022 -0400

            Fix bug for when guides don't agree in CRISPRessoAggregate

        commit 7eb763116a8c60603f1cd654645215767ee8eb52
        Author: Kendell Clement <[email protected]>
        Date:   Thu May 5 03:28:21 2022 -0400

            Fix bug for case of empty summary plots in report generation

        commit 0324fa67d14ed945f0c9531d9bcf73ebcf4ca042
        Author: Kendell Clement <[email protected]>
        Date:   Thu May 5 03:28:02 2022 -0400

            Create report for number of significant bases in CRISPRessoCompare

        commit e3c9d0026a9ee6732f3ed6bdcf2a824850d7e66a
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 22:43:11 2022 -0400

            Update pickle to json in readme and CRISPRessoPooledWGSCompare

        commit 1553f7977c12bf1091a20ca55b878bccfb739b61
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 18:10:04 2022 -0400

            Merge pull request #4 from pinellolab/master (#218)

        commit bcecbfc047d294e26f381a6668e08cb4db24445c
        Merge: 15b0e05b bb13e007
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 18:06:37 2022 -0400

            Merge branch 'master' into master

        commit bb13e007738d6e7a4909e01f03daff592f334f36
        Merge: af4ab6e8 d0b41483
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 17:59:32 2022 -0400

            Merge branch 'master' of https://github.com/edilytics/CRISPResso2

        commit 15b0e05b9e03bbec5236e58776ddf9aa2f93180e
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 17:54:52 2022 -0400

            2 flexible pooled input (#217)

            * Batch type coerce and r2 file check

            * Upgrade tabs for bootstrap5

            * Update readme with additional pooled amplicon file headers

            Co-authored-by: Samuel Nichols <[email protected]>

        commit d0b41483bee704940ba60c58289f412b04c71659
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 13:43:43 2022 -0400

            Update README.md

        commit ce49fab5301cb73ba0daf6c765e350eb083c76f1
        Merge: 5f909713 b913fcb4
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 13:40:30 2022 -0400

            Merge pull request #3 from edilytics/2-flexible-pooled-input

            Add flexibility to CRISPRessoPooled amplicon input by allowing headers. Also, prime editing and quantification window coordinate parameters can be passed to CRISPRessoPooled.

        commit b913fcb402a8ba3106c3ff7913563a33d8d19fca
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 13:38:25 2022 -0400

            Update CRISPRessoPooledCORE.py

            Replace process to read header, increase flexibility for column order

        commit 945bf31f16530b7ce25b89095b2c7005bf146117
        Merge: 7b8f6788 5f909713
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 12:45:24 2022 -0400

            Merge branch 'master' into 2-flexible-pooled-input

        commit 5f9097133765736a7c2fe3c8e9b730845fed0b70
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 12:23:44 2022 -0400

            Version bump to 2.2.8

        commit c4a94ce0e06c6ebae13e128fbe6b708e635121c4
        Author: Kendell Clement <[email protected]>
        Date:   Wed May 4 00:13:17 2022 -0400

            Fix summary plot representation for multi reports

            *fixed old reference to make_multi_report which called old summary plot format
            * renamed summary_plot to summary_plots to reflect a dict with multiple plots

        commit 62900e9ae6fa37ce99a04f12a63ed5c912f75042
        Author: Cole Lyman <[email protected]>
        Date:   Tue May 3 20:47:52 2022 -0600

            Large aggregation (#192)

            * Squashed commit of the following:

            commit 8564eb03f0d9e62abf4b7528baf5c2ae296be8f9
            Merge: f6ef62c 07cc7d8
            Author: Kendell Clement <[email protected]>
            Date:   Tue Jan 11 16:20:15 2022 -0500

                Merge branch 'indel-alignment-fix' of https://github.com/edilytics/CRISPResso2 into indel-alignment-fix

            commit 07cc7d856ab3fcbbaa5381f17f29568192388887
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 15:29:59 2021 -0700

                Fix bug in `find_indels_substitutions`

                This bug occurred when there was a deletion at the end of a sequence, and was
                thus not properly accounted for.

            commit f6ef62cfdf909adac1b10ea86555cd218f8b2a74
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 15:29:59 2021 -0700

                Fix bug in `find_indels_substitutions`

                This bug occurred when there was a deletion at the end of a sequence, and was
                thus not properly accounted for.

            commit 7212f87f4be60057a6c848947ff6b5efde132a25
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 15:26:17 2021 -0700

                Add a unit test for `find_indels_substitutions`

                This unit test checks for deletions at the end of a sequence, which are
                inherently outside of the include_indx_set window.

            commit d50b4e903b973c71a275e31d470b40e59280ee13
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 15:03:22 2021 -0700

                Fix a bug in `find_indels_substitutions`

                The bug that this commit fixes is when an insertion occurs at the edge of the
                include indexes. The trouble with this earlier was that it was using the `idx`
                to calculate the size of the insertion, but the `idx` wasn't being incremented
                anymore because it was outside of the include window.

            commit 4db066f7bc333b7662a9232ac732ebb33ac3ace8
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 15:01:39 2021 -0700

                Add test case for `find_indels_substitutions`

                This test case is extracted from the CRISPRessoBatch integration test and
                provides an example where there is an insertion at the edge of the include
                index.

            commit 3b3a7417f5bbd6c2785a2af54a47e01d2e820451
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 11:37:07 2021 -0700

                Fix bug in CRISPRessoCompare where sample names were not properly set

                This was a place where it was (partially) missed during the crispresso2_info
                object refactoring.

            commit e9f5eff3d95b676b5ee2e23371a5604f600d34b2
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 15:26:17 2021 -0700

                Add a unit test for `find_indels_substitutions`

                This unit test checks for deletions at the end of a sequence, which are
                inherently outside of the include_indx_set window.

            commit d4d45a918254ab19a7e7956e9e731389c6f36ecb
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 15:03:22 2021 -0700

                Fix a bug in `find_indels_substitutions`

                The bug that this commit fixes is when an insertion occurs at the edge of the
                include indexes. The trouble with this earlier was that it was using the `idx`
                to calculate the size of the insertion, but the `idx` wasn't being incremented
                anymore because it was outside of the include window.

            commit 13f00bb40239c83e6e5cf844561fdb7000d3d9ab
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 15:01:39 2021 -0700

                Add test case for `find_indels_substitutions`

                This test case is extracted from the CRISPRessoBatch integration test and
                provides an example where there is an insertion at the edge of the include
                index.

            commit 659ae34e8fd106f7ecc163b5bea0b5a80ab0283c
            Author: Cole Lyman <[email protected]>
            Date:   Fri Dec 10 11:37:07 2021 -0700

                Fix bug in CRISPRessoCompare where sample names were not properly set

                This was a place where it was (partially) missed during the crispresso2_info
                object refactoring.

            * Add parameter `--suppress_batch_summary_plots`

            If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

            * Pep formatting cleanup

            * Add summary nucleotide plots to aggregate

            * Aggregate plots are paginated

            * Update CRISPRessoAggregateCORE.py

            Remove max sample limit for plotting

            * Add --max_samples_per_summary_plot to CRISPRessoAggregate

            Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

            * Add plotly function to plot an interactive heatmap

            * Fix deprecated numpy type to suppress warning

            * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

            These heatmaps are interactive (zoomable and panable) and show for each sample
            the percentage of insertions, substitutions, and deletions.

            * Add the heatmap summaries to the CRISPRessoAggregate report

            * Update Bootstrap to 5.1.3

            This is mainly so that we can use the fullscreen modal functionality in this version.

            * Move the plotly heatmaps to a Bootstrap modal

            * Fix bug where plots were not filling up entire modal.

            I have tried countless different ways for this to work, and this is the best
            that I can come up with. After the modal is opened it triggers the plot to
            resize, and then for some reason you need to trigger the resize event. I think
            this is because a `div` changing size won't actually trigger the resizing of the
            plot (and neither will just calling `Plotly.Plots.resize`...?!).

            * Update the axis labels and add autosize to plotly heatmaps

            I'm pretty sure the autosize doesn't do anything, but it is there for good
            measure.

            * Abandon attempts to make plots fullscreen

            This includes removing the Bootstrap modal (two out of the three plots would
            resize properly and I couldn't figure out a way to have the plot displayed
            outside of the modal). I have left in some javascript to make the plot
            fullscreen, but I couldn't get the formatting quite right and the plot wasn't
            much bigger in the fullscreen version because there was a ton of space between
            the plot and the heatmap. If some brave soul would like to tackle it, feel free!

            * Rename and refactor how plot data is passed around

            I have consolidated how the plot data is passed around, so that now you can pass
            in only one dict with all of the information instead of 4 or 5 separate
            parameters. I also renamed the `heatmap_plot_*` to
            `allele_modification_heatmap_*`.

            * Implement the line plot version of the modification percentages

            This also includes correctly resizing the plot when the line plot tab is
            selected!

            * Change default `max_samples_per_summary_plot` to be 150 instead of 250

            * Remove extra assignments of `this_number_samples` and suppress plot

            The plot that is suppressed is the large nucleotide quilt when there is a large
            number of samples. Is it okay to suppress this plot @kclem?

            * Implement parallel plotting in CRISPRessoAggregate

            * Fix sample indexing error and heatmap scaling for large number of samples

            * Add parameter `--suppress_batch_summary_plots`

            If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

            * Pep formatting cleanup

            * Add summary nucleotide plots to aggregate

            * Aggregate plots are paginated

            * Update CRISPRessoAggregateCORE.py

            Remove max sample limit for plotting

            * Add --max_samples_per_summary_plot to CRISPRessoAggregate

            Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

            * Add plotly function to plot an interactive heatmap

            * Fix deprecated numpy type to suppress warning

            * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

            These heatmaps are interactive (zoomable and panable) and show for each sample
            the percentage of insertions, substitutions, and deletions.

            * Add the heatmap summaries to the CRISPRessoAggregate report

            * Update Bootstrap to 5.1.3

            This is mainly so that we can use the fullscreen modal functionality in this version.

            * Move the plotly heatmaps to a Bootstrap modal

            * Fix bug where plots were not filling up entire modal.

            I have tried countless different ways for this to work, and this is the best
            that I can come up with. After the modal is opened it triggers the plot to
            resize, and then for some reason you need to trigger the resize event. I think
            this is because a `div` changing size won't actually trigger the resizing of the
            plot (and neither will just calling `Plotly.Plots.resize`...?!).

            * Update the axis labels and add autosize to plotly heatmaps

            I'm pretty sure the autosize doesn't do anything, but it is there for good
            measure.

            * Abandon attempts to make plots fullscreen

            This includes removing the Bootstrap modal (two out of the three plots would
            resize properly and I couldn't figure out a way to have the plot displayed
            outside of the modal). I have left in some javascript to make the plot
            fullscreen, but I couldn't get the formatting quite right and the plot wasn't
            much bigger in the fullscreen version because there was a ton of space between
            the plot and the heatmap. If some brave soul would like to tackle it, feel free!

            * Rename and refactor how plot data is passed around

            I have consolidated how the plot data is passed around, so that now you can pass
            in only one dict with all of the information instead of 4 or 5 separate
            parameters. I also renamed the `heatmap_plot_*` to
            `allele_modification_heatmap_*`.

            * Implement the line plot version of the modification percentages

            This also includes correctly resizing the plot when the line plot tab is
            selected!

            * Change default `max_samples_per_summary_plot` to be 150 instead of 250

            * Remove extra assignments of `this_number_samples` and suppress plot

            The plot that is suppressed is the large nucleotide quilt when there is a large
            number of samples. Is it okay to suppress this plot @kclem?

            * Implement parallel plotting in CRISPRessoAggregate

            * Fix sample indexing error and heatmap scaling for large number of samples

            * Add plotly requrement to setup.py

            * Remove space around vertical barcharts

            * Add scrollbar to long images in multiReport

            * Fill in default (empty) values to allele modification plots

            When not running CRISPRessoAggregate, default values for the
            `allele_modification_heatmap_plot` and `allele_modification_lin_plot`
            dictionaries will be set so that the template can be properly rendered.

            * Include CRISPRessoBatch in the refactor of how summary_plot dicts are handled

            * Update dockerfile for new docker

            * minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

            * Allow for flexible parsing of quant window coordinates

            * CRISPRessoPooled debug flash command, fix pep formatting

            * Set flexiguide homology parameter type to int

            * Coerce ints in batch file checking (#200)

            * Batch type coerce and r2 file check

            * Revert "Batch type coerce and r2 file check"

            This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

            * Coerce int values

            * Handle multiple qwcs in batch mode

            If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

            * Fix bug from old pandas for int cols

            Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

            * Create allele modification heatmaps and line plots in CRISPRessoBatch

            * Add allele modification heatmaps and line plots to CRISPRessoBatch

            * Make all plots in CRISPRessoBatch run in parallel

            * Make `--suppress_batch_summary_plots` store true

            Also, only open and shutdown the process pool when necessary.

            * Add blank values for allele_modification entries when not present

            Co-authored-by: Kendell Clement <[email protected]>
            Co-authored-by: dharjanto <[email protected]>
            Co-authored-by: Samuel Nichols <[email protected]>

        commit f67376fc9ab0e407d4086aa42fd1c77706ebc9c0
        Author: Kendell Clement <[email protected]>
        Date:   Fri Apr 15 00:46:30 2022 -0400

            Fix bug from old pandas for int cols

            Evidently old pandas versions throw an error if a column doesn't exist. This checks to see if the column exists before the values are set.

        commit b34fe2956ff88629809b2434878028723dfc4895
        Author: Kendell Clement <[email protected]>
        Date:   Thu Apr 14 23:58:07 2022 -0400

            Handle multiple qwcs in batch mode

            If multiple qwcs were provided in batch mode, a parsing error would occur. This fixes this bug.

        commit c94e3b9f2e301bda91e9c1e6f4ef794b33b5dbf0
        Author: Samuel Nichols <[email protected]>
        Date:   Thu Apr 14 21:48:32 2022 -0600

            Coerce ints in batch file checking (#200)

            * Batch type coerce and r2 file check

            * Revert "Batch type coerce and r2 file check"

            This reverts commit f91736688ea9739cf3063e3601c52ad6da1116a4.

            * Coerce int values

        commit fc4542491bb86eb143db0044a848a56234403496
        Author: Kendell Clement <[email protected]>
        Date:   Thu Apr 14 22:13:23 2022 -0400

            Set flexiguide homology parameter type to int

        commit 23fe2aa8e26067d1bcf36bfafc67e023c7588d2f
        Author: Kendell Clement <[email protected]>
        Date:   Thu Apr 14 22:12:37 2022 -0400

            CRISPRessoPooled debug flash command, fix pep formatting

        commit d292d33d8c1fa3bfd2cee656643fd47bcdab161d
        Author: Kendell Clement <[email protected]>
        Date:   Thu Apr 14 22:00:19 2022 -0400

            Allow for flexible parsing of quant window coordinates

        commit e1667cb53a7ea6fbb33369c8530a78639ed423ec
        Author: dharjanto <[email protected]>
        Date:   Mon Apr 11 22:08:21 2022 -0400

            minor bug fixes for plotCustomAllelePlot.py to work with Python3 (#212)

        commit 7b8f6788da18f6ab173fa3c3d10f4ab6bb2acc26
        Author: Samuel Nichols <[email protected]>
        Date:   Fri Apr 8 10:21:00 2022 -0600

            Update README

        commit 9bc24cd0474ed9f398dff64274d3181c4b2f8637
        Author: Samuel Nichols <[email protected]>
        Date:   Tue Mar 29 11:25:09 2022 -0600

            Using Amplicon_Name

        commit 88ac5d72074b3da63de035e02c911ce34cd29414
        Merge: b6057a2d e5afa478
        Author: Samuel Nichols <[email protected]>
        Date:   Mon Mar 28 22:32:09 2022 -0600

            Merge remote-tracking branch 'origin/master' into 2-flexible-pooled-input

        commit b6057a2d54cb8637ff0900416de8e2de72213f76
        Author: Samuel Nichols <[email protected]>
        Date:   Mon Mar 28 20:53:05 2022 -0600

            Printing info statements for matched headers

        commit af4ab6e8507d7aa4b7b68f217a458e0d9c966f55
        Merge: bbb7d6f0 51a943c3
        Author: Cole Lyman <[email protected]>
        Date:   Fri Mar 25 09:44:13 2022 -0600

            Merge branch 'pinellolab:master' into master

        commit 3c1eb012fc02563e3e963f17a62c7e932f5bcddc
        Author: Samuel Nichols <[email protected]>
        Date:   Thu Mar 24 12:31:43 2022 -0600

            Debugging and column checking

        commit 0b47acbc592a6df6adf14641357b2104b76be691
        Author: Samuel Nichols <[email protected]>
        Date:   Wed Mar 23 09:42:51 2022 -0600

            New variables added to pooled

        commit a0ff3a44d6d19d7b37f91919b5c0180206f72d53
        Author: Samuel Nichols <[email protected]>
        Date:   Mon Mar 21 09:32:28 2022 -0600

            Read as string not bytes

        commit 710675fc3c0307e21103abd604315b47ff80a894
        Author: Samuel Nichols <[email protected]>
        Date:   Wed Mar 16 13:51:30 2022 -0600

            Adding command building for new options

        commit f386818a48e5c840bd567611e6f1320c8146cac7
        Author: Samuel Nichols <[email protected]>
        Date:   Wed Mar 16 10:08:33 2022 -0600

            Comment out df_template.iloc instance

        commi…
kclem added a commit that referenced this pull request Apr 5, 2024
* Change CRISPResso_status.txt format to JSON (#46)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* add json read for status file

* changed Formatter to json format

* fixed json access variable name: message

* changed  perentage_complete to numeric

* changed status file to .json

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* New makefile commands

* changed file to .json

* changed status to json file

* Make JSON human readable by adding new lines

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* point to test branch

* pointed CI config to testing branch

* Update integration_tests.yml

point to master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: Samuel Nichols <[email protected]>

* Trevor/fastp integration (#50)

* Update check_program to check versions and create check_fastq function

* Update fastq arg, implement fastp in get_most_frequent_reads

* Bump version to 2.3.0

* Deprecate Flash and Trimmomatic parameters, and update fastp params

* Update guess_amplicons and guess_guides to remove max_paired_end_reads_overlap

* Implement trimming of single end reads

* Merge (and trim) reads in CRISPRessoCORE with fastp

* Modify error handling to account for fastp errors

* Replace flash and trimmomatic with fastp in Docker dependencies

* Update LICENSE.txt with fastp info

* Remove min and max amplicon length (no longer needed)

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Implement trimming with fastp in CRISPRessoPooled

* Implemend merging (and trimming) with fastp in CRISPRessoPooled

* Fixed minor fastp errors

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* Update where the test point to

* Fix 'Prime-edited' key not found (#32)

* Move 'Prime-edited' amplicon name check

By moving this, it will check if there is an amplicon named
'Prime-edited' (which is a reserved name) even if the
`prime_editing_pegRNA_extension_seq` parameter is empty.

* Only search for scaffold integration when pegRNA extension seq is provided

* Remove spaces at the end of lines

* Docker size (#49)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* 3.4->2.08

* Put ttf-mscorefonts-installer back above apt-get clean

* restore slash, replace fastp with trimmomatic and flash, add autoremove step

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* initial readme modifications

* Updated readme to remove deprecated commands, updated help text to reflect new version and fastp

* Pointing test branch back at master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Samuel Nichols <[email protected]>

* Guardrails clean history (#34)

* Include guardrail functions

* Add CRISPRessoReports subtree

* Refactor to use CRISPRessoReports module

* Include guardrail functions

* Functional guardrails, needs reports update

* Add guardrail partial

* fix guardrials partial

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Update C cythonized files

* Add exact numbers to guardrails printouts

* Remove extraneous whitespace from CRISPRessoCOREResources.pyx

* Fix calculation of `total_mods` from being negative

The issue was that `all_deletion_coordinates` just tells you how many deletions
were present, but not how long the deletion is.

* Changes to message

* Remove old tag

* Point tests at guardrails

* Restore C2 pro check

* Save message with guardrail name

* Point tests repo at master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>

* Fix case sensitivity in Prime Editing mode (#54)

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* Make all amplicons in amplicon_seq_arr uppercase

This fixes https://github.com/pinellolab/CRISPResso2/issues/396

* Allow RNA values to be provided for prime_editing_pegRNA_scaffold_seq

* Fix 'Prime-edited' key not found (#32)

* Move 'Prime-edited' amplicon name check

By moving this, it will check if there is an amplicon named
'Prime-edited' (which is a reserved name) even if the
`prime_editing_pegRNA_extension_seq` parameter is empty.

* Only search for scaffold integration when pegRNA extension seq is provided

* Remove spaces at the end of lines

* Docker size (#49)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* 3.4->2.08

* Put ttf-mscorefonts-installer back above apt-get clean

* restore slash, replace fastp with trimmomatic and flash, add autoremove step

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Guardrails clean history (#34)

* Include guardrail functions

* Add CRISPRessoReports subtree

* Refactor to use CRISPRessoReports module

* Include guardrail functions

* Functional guardrails, needs reports update

* Add guardrail partial

* fix guardrials partial

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Update C cythonized files

* Add exact numbers to guardrails printouts

* Remove extraneous whitespace from CRISPRessoCOREResources.pyx

* Fix calculation of `total_mods` from being negative

The issue was that `all_deletion_coordinates` just tells you how many deletions
were present, but not how long the deletion is.

* Changes to message

* Remove old tag

* Point tests at guardrails

* Restore C2 pro check

* Save message with guardrail name

* Point tests repo at master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>

---------

Co-authored-by: Samuel Nichols <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: trevormartinj7 <[email protected]>

* Batch d3 clean (#55)

* imports C2Pro plots if available

* added --use_matplotlib flag

* added C2Pro
matched api funciton signatures

* added api args for plotly

* added **kwargs

* renamed config to custom_config, more specificity

* added backend flag for plotly kaleido

* added pro_installed boolean for templates, added plotly dependency to report templates

* Squashed commit of the following:

commit c909ea3b34e87ce637e00dac075d2bb2f8bfb954
Author: McKay <[email protected]>
Date:   Thu Feb 15 15:55:23 2024 -0700

    added plotly dependency for pro

commit 76b3601f6a0144f100266153f1c999e0c5de65de
Author: Samuel Nichols <[email protected]>
Date:   Fri Jan 12 09:56:19 2024 -0700

    Squashed commit of the following:

    commit 603f2eff9d1aa21ae95f3e134da303b8018d3a33
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 09:48:20 2024 -0700

        fix guardrials partial

    commit 22fc03183a8070c30dfb74d5c23575ac19019855
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 08:54:01 2024 -0700

        Add guardrail partial

    commit e55f6b21972b578261bc5a864ce1d653d98f9e34
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Jan 8 07:50:59 2024 -0700

        Functional guardrails, needs reports update

    commit 6e968e9699ed59a47d88191d03768e042d8b60a4
    Merge: 32b49685 e948ce10
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Dec 18 13:34:36 2023 -0700

        Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history

    commit 32b49685da320501dad2b0ebbb57887b66220ba8
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit 4e309cf6f732565d635de3d4c5d074ada3027e2d
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:55 2023 -0700

        Refactor to use CRISPRessoReports module

    commit e648dc087c0055bc5d2fca13c64071a371dea941
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:11 2023 -0700

        Add CRISPRessoReports subtree

    commit e948ce107ebb0d1d99010ed12e937f34b5e607d4
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit d33c748871a625facfe8d792e29c77ab9779138f
    Author: Kendell Clement <[email protected]>
    Date:   Tue Nov 7 16:31:06 2023 -0700

        Include parameter --assign_ambiguous_alignments_to_first_reference in readme

    commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc
    Author: Kendell Clement <[email protected]>
    Date:   Wed Oct 11 17:17:30 2023 -0600

        Enable quantification by sgRNA (#348)

        This PR includes:
        - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs']
        - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately.

        I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command:

        ```

        CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table  -n hdr3 -p max -gn guide1,guide2
        ```

        ```
        python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3
        ```

        This produces:
        ```
        Processed 25000 alleles
        Reference: Reference (2391/23415 modified reads)
                UNMODIFIED: 21024
                MODIFIED guide1: 2359
                MODIFIED guide2: 32
        Reference: HDR (856/1577 modified reads)
                UNMODIFIED: 721
                MODIFIED guide1: 854
                MODIFIED guide1 + guide2: 1
                MODIFIED guide2: 1
         ```

    commit 2e3da02fdbed2fa8ae02a277763d65a502459827
    Author: Cole Lyman <[email protected]>
    Date:   Tue Oct 10 15:29:08 2023 -0600

        changed tuple to list for matplotlib change (#31) (#346)

        Co-authored-by: mbowcut2 <[email protected]>

    commit cd3c332135fe4db0f9218e3d87263d5c65838ed9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:54:46 2023 -0600

        rename script to camel case

    commit 7c719d65fb36ac7654db9040f226564ea28fcab9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:53:44 2023 -0600

        Add new script for counting high quality bases

    commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 14 15:15:30 2023 -0600

        Prime editing alignment params (#336)

        Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty.

        CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences.

        The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs.

    commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917
    Author: Cole Lyman <[email protected]>
    Date:   Thu Sep 7 16:43:30 2023 -0600

        Fix samtools piping (#325)

        * Remove samtools pipe stderr to stdout

        Sometimes some of the libraries that samtools depends on don't have the correct
        version information, and as such samtools will report this to stderr when run.
        Because we pipe the output of samtools, we expect it to be valid SAM format, but
        when these library version messages are reported, it breaks CRISPRessoWGS.

        * Remove extra spacing at end of lines and add missing comma in WGS

        * Log stderr from samtools in CRISPRessoWGS

    commit 8feff4101f27406d9d88ace97d31a518276bff3f
    Author: Cole Lyman <[email protected]>
    Date:   Fri Sep 1 09:43:56 2023 -0600

        Replace link to CRISPResso schematic with raw URL in README (#329)

        * Replace link to CRISPResso schematic with raw URL

        * Add new lines to the beginning of unordered lists

    commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:52:12 2023 -0600

        Try to unbreak CircleCI

    commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:27 2023 -0600

        Center command line text messages

    commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:07 2023 -0600

        Fix bug in prime-editing scaffold-incorporation plotting

        If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read.

    commit 2b36a1a5c35e8a93516ce8baf464595615e0f402
    Author: Kendell Clement <[email protected]>
    Date:   Wed Aug 9 15:29:48 2023 -0600

        CRISPRessoPooled --compile_postrun_references bug fixes

    commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db
    Author: Kendell Clement <[email protected]>
    Date:   Tue Aug 8 23:30:15 2023 -0600

        Fix missing ' in Pooled --demultiplex_only_at_amplicons

    commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664
    Author: Cole Lyman <[email protected]>
    Date:   Mon Jul 24 10:47:46 2023 -0600

        Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316)

        * Make sorting stable

        * Including c files

        * Sort by #Reads instead of %Reads to avoid floating point errors

        ---------

        Co-authored-by: Samuel Nichols <[email protected]>

    commit de05533b3511a84f3b6b14fc2ef64db041613261
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jul 6 13:54:45 2023 -0600

        Fix multiprocessing lambda pickling (#311)

        * Fix running plots in parallel

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        * Fix multiprocessing lambda pickling (#20)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Further fixes to pickling multiprocessing error (#21)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Use Counter instead of defaultdict in CRISPRessoCORE

        * Update process_futures to dict in Batch and Aggregate

    commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jul 3 17:12:09 2023 -0600

        Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append

    commit 7285da0e987b77b72c8885bb35940e0f50c146bd
    Author: Kendell Clement <[email protected]>
    Date:   Fri Jun 23 16:50:33 2023 -0600

        Fix print bug for invalid fastq

    commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c
    Author: kclem <[email protected]>
    Date:   Wed Jun 21 16:03:48 2023 -0600

        Slugify before creating filename - replaces invalid characters in batch names with _

    commit f97e29c67de4c80b8d6b9cf334f363be4b514ade
    Author: Cole Lyman <[email protected]>
    Date:   Wed Jun 21 14:43:43 2023 -0600

        Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307)

        * Add verbosity argument to CRISPRessoAggregate (#18)

        * Allow for amplicon and guide seqs to be some variant of NA in batch (#19)

        This was discovered when attempting to infer amplicon sequences in batch mode on
        the web interface, NAs were supplied for the amplicon sequences to the sub
        CRISPResso commands.

    commit 32e1e9797da5c3033cdc588e92f06b8813961953
    Author: Mark Clement <[email protected]>
    Date:   Wed Jun 21 14:01:00 2023 -0600

        Allow for interrogation of overlapping sgRNA sites

    commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 12 12:16:47 2023 -0600

        Check input fastq file format

        Asserts input format of fastq files - including if gzipped files are missing the gz suffix.

    commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:41:55 2023 -0600

        Fix CRISPRessoArgParser

    commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:29:31 2023 -0600

        Cosmetic updates for command-line use

        - version bump to 2.2.13
        - If no args are provided, the command line version will print out an abbreviated help message
        - parameters can be excluded from CRISPRessoArgParser

    commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:47 2023 -0600

        Fix multiprocessing error, don't start pool when only using single thread (#302)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        * Only start process pools when using multiple processes

        This is mainly to solve the issue when running on AWS Lambda, but this should
        improve single core performance overall.

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 92a705c939b370373a70cf6ae9f1616de33288b9
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:06 2023 -0600

        Update `base_editor` parameters in README and add Plot Harness (#301)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 7d46c4490235df45c5546b1b470e4e6a99727031
    Author: Cole Lyman <[email protected]>
    Date:   Wed May 10 15:41:33 2023 -0600

        Clarify CRISPRessoWGS intended use (#303)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add sample plotting jupyter notebook

        * Add clarifying info to CRISPRessoWGS description

        Clarify WGS usage

    commit 833a701787bb47674b3e921c38cac6189c775cf7
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 17:02:46 2023 -0400

        Remove debug print statements

    commit 712eb2a11825e8d36f2870deb12b35486bd633fb
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 16:40:07 2023 -0400

        Allow dashes in filenames resolve #73

    commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:41:58 2023 -0400

        Raise exceptions from within futures in plot_pool

    commit 7e807a60de2a9d18bccd034b87106ceaf7153338
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:38:56 2023 -0400

        Fix future pandas indexing warning

        Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead"

    commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9
    Author: Cole Lyman <[email protected]>
    Date:   Thu Apr 20 13:59:27 2023 -0600

        Remove debug print statements fixes #295 (#297)

        The format string option used here is only available in Python version >=3.8.

    commit 478c06f784603e96d20f96e91993fdcc4ac35c8a
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 13 12:09:26 2023 -0400

        Update plotCustomAllelePlot.py script for #292 (#293)

        Update type of 'max_rows' param to int
        Fix location of 'args' in crispresso2_info object

    commit bcdae39e05d530f4a4e78738c3b30f7664981919
    Author: Kendell Clement <[email protected]>
    Date:   Mon Mar 27 13:18:34 2023 -0400

        Update pooled parameter format

    commit 546446e36e7e68b527767d6c31ec341a49df2059
    Author: Kendell Clement <[email protected]>
    Date:   Tue Feb 14 16:26:23 2023 -0500

        Fix running plots in parallel (#286)

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        Co-authored-by: Cole Lyman <[email protected]>

    commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1
    Author: kclem <[email protected]>
    Date:   Fri Feb 10 15:45:15 2023 -0500

        Fix #283 to avoid filename collisions

        Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29

    commit e577318006cd17b2725bd028e5e56634c6eb829a
    Author: kclem <[email protected]>
    Date:   Mon Feb 6 16:37:25 2023 -0500

        Case-insensitive headers accepted in CRISPRessoPooled

    commit d34927620a4a6126a9988b3041e76f60728abbfe
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:48:33 2023 -0500

        Fix print statement in CORE

    commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:22:51 2023 -0500

        Version bump to 2.2.12

    commit 1d4679c72d0c8b4154317c9aff5179217198e2d7
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:01:31 2023 -0500

        Status Updates + Pooled Mixed Mode Update (#279)

        * Implement logging handler to overwrite the latest log status to file

        * Add StatusHandler to CRISPRessoCORE log

        This will take the latest log output and write it to a file (`status.txt`), the
        catch being that with each log the file is overwritten so that one can easily
        tell where CRISPResso currently is and what the error is (if any). These changes
        include some slight refactoring in order to accomodate any potential parameter
        exceptions.

        * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn`

        * Add StatusHandler to CRISPRessoPooled and a little refactoring

        * Implement `percent_complete` to the status log

        * Add StatusHandler to CRISPRessoAggregate log

        * Add StatusHandler to CRISPRessoCompare log

        * Add StatusHandler to CRISPRessoPooledWGSCompare log

        * Add StatusHandler to CRISPRessoWGS log

        * Rename `status.txt` to `CRISPResso_status.txt`

        * Modify status log names to match the tool they are generated from

        * Add percent_complete stages to CRISPRessoCORE

        These also include log statements of each plot that is being generated as well
        as fixing some variable name collisions with `ind`.

        * Format the percentage in the log to be 2 decimal places

        * Change all plotting logs from `info` to `debug` and simplify progress

        This refactors how the progress of the plots is calculated, making it much
        simplier. Before this change we would of had to keep track of the number of
        times `percent_complete` was output, but now it simply updates the percent
        complete after each amplicon is finished processing. Hopefully this will make
        things easier to mantain even though it will be a little less "accurate" (not
        sure how accurate the original implementation was...).

        * Implemented shared console log handler across all CRISPResso* calls

        This allows for easy changes to logging formatting, which was inspired by having
        to change the default logging level. The default logging level needs to be set
        at `logging.DEBUG` in order for the debug log statements to not be ignored for
        the running and status logs.

        * Add ability to set the verbosity level to each CRISPResso* tool

        This allows users to set a verbosity level between 1 and 4 using the
        `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the
        level will default to 4, being the most verbose.

        * Implement showing the last seen `percent_compelte` when none is provided

        * Keep track of and log when multiple parallel runs are completed

        These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that
        we can now display when a run is completed. This potentially breaks how
        signals and interupts are handled with multiple runs happening, but this needs
        to be reviewed.

        * Add debug and percentage complete to CRISPRessoBatch

        * Add percent complete to CRISPRessoPooled

        * Add debug and percent_complete message to CRISPRessoAggregate

        * Add `percent_complete` to CRISPRessoCompare

        * Add `percent_complete` to CRISPRessoPooledWGSCompare

        * Add status and `percent_complete` to CRISPRessoMeta

        * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        * Fix bug to flow CRISPRessoPooled options to sub command

        * Make amplicon file args variable name clear

        * Update how parameters are set and retrieved from parameter object

        The refactor in the previous commit changed the type of the arguments to a
        dictionary which doesn't have the parameters as attributes, and this commit
        fixes that error.

        * Add note in output header for change in default CRISPRessoPooled

        In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the
        default when running in mixed-mode. This is to allow for inexact alignments of
        the reads and the amplicons to the genome. For more context, see this issue
        https://github.com/pinellolab/CRISPResso2/issues/276

        * Clarify the verbosity parameter help message

        * Separate out parameters to `normalize_name` in CRISPRessoCORE

        * Separate out parameters to `normalize_name` in CRISPRessoWGS

        * Separate out parameters to `normalize_name` in CRISPRessoPooled

        * Separate out parameters to `normalize_name` in CRISPRessoCompare

        * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name`

        * Refactor `run_crispresso_cmds` to not require a `logger`

        This commit implements the functionality to make the `logger` object optional by
        seeing which module called the `run_crispresso_cmds` function and obtaining the
        correct object from that module name.

        The function also immediately returns when no commands are passed to it.

        * Add amplicon name to plotting debug statements in CRISPRessoCORE

        ---------

        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Samuel Nichols <[email protected]>

    commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jan 26 15:27:27 2023 -0500

        CRISPRessoPooled custom header fix (#278)

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 104866e1080c973bb025d1a5ba59b19dca1658af
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 14:00:26 2023 -0700

        Fix deprecated numpy type names (fixes #269) (#270)

        In the most recent version of numpy (1.24) some of the types have been
        deprecated. This commit fixes these errors.

    commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 06:49:35 2023 -0700

        Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274)

        I have suffered enough trying to debug my installation, so hopefully this helps
        someone else.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b9851e98104602eb78c2b384105267624295e9d3
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 22 13:30:23 2022 -0700

        Fix bug when pooled bam is input (#265)

        This change checks to see if a bam file was input, and if so it doesn't try to
        remove any intermediate files because there aren't any.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b822612642043e75a19042941f69b457ce51f517
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 15:26:45 2022 -0500

        Delete vscode settings

    commit b99aa624dec68ef7d19264340ce0cafa829625f4
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:29:14 2022 -0500

        Clarify input param help for pooled bam

    commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:28:54 2022 -0500

        Fix #235 - Cigar string is * if read unaligned

        Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235.

    commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 8 13:48:17 2022 -0700

        Add deprecation notice (#260)

        * Add FLASh and Trimmomatic deprecation notice to CLI output

        * Add Edilytics email address to CLI output

    commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a
    Author: Kendell Clement <[email protected]>
    Date:   Tue Dec 6 12:16:19 2022 -0500

        Format filterReadsOnSequencePresence script

    commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e
    Author: Kendell Clement <[email protected]>
    Date:   Fri Dec 2 22:12:54 2022 -0500

        Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string

    commit 9ddea40f7f02b546941ddaa4c71fc5283075051a
    Author: kclem <[email protected]>
    Date:   Mon Nov 14 10:33:04 2022 -0500

        Add check for prime editing extension sequence in prime edited sequence

        if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence.

    commit 152f2dd5001da7090641ee8a1326bde9f7e8104e
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:53:41 2022 -0500

        Version bump to 2.2.11a

    commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:47:30 2022 -0500

        Add param to override prime editing sequence checks

        CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.'

    commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 10:06:51 2022 -0500

        Update filterReadsOnSequencePresence.py

    commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb
    Author: Kendell Clement <[email protected]>
    Date:   Mon Nov 7 22:25:16 2022 -0500

        Add script to filter input based on sequence presence

    commit 713e57a19c35180035ca35e11a5820065eda0198
    Author: Kendell Clement <[email protected]>
    Date:   Tue Oct 18 16:02:26 2022 -0400

        Allow spaces in read names for CRISPRessoWGS

    commit 39ce008bdddccdd8229c0ba185dce78bc2f66968
    Author: Cole Lyman <[email protected]>
    Date:   Sat Oct 8 21:09:58 2022 -0600

        Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250)

    commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Oct 8 23:08:47 2022 -0400

        Batch amplicon plots (#251)

        * Error out if HDR amplicon matches existing amplicon

        * Add check for amplicon sequence uniqueness

        * Fix bug with bam_input not having bam_output

        * Test for no returned lines in auto mode, version bump to 2.2.11

        * Fix pandas deprecation of df.append

    commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8
    Author: Kendell Clement <[email protected]>
    Date:   Thu Oct 6 16:32:02 2022 -0400

        Fix CRISPRessoBatch plot pool bug when plots are suppressed

    commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 21 21:04:51 2022 -0600

        Fix batch quilt plot name (#249)

        This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch.

    commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 15 15:49:08 2022 -0400

        Version bump to 2.2.10

    commit c5f79aebfc1ae209f4ee320df250eed89a02787c
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 14 14:24:55 2022 -0600

        Parallel plot refactor (#247)

        * Fix duplicate plotting in CRISPRessoBatch aggregate

        * Refactor mulltiprocessing plots in CRISPRessoBatch

        * Refactor multiprocessing plots in CRISPRessoCORE

        * Refactor multiprocessing plots for CRISPRessoAggregate

    commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 13 14:12:11 2022 -0400

        print files in curr dir if Aggregate can't find files

    commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0
    Author: Kendell Clement <[email protected]>
    Date:   Mon Sep 12 10:32:57 2022 -0400

        Spelling typo

    commit c15f01c75083403f17c58c121b2afe97e9f2a1ec
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 6 17:49:52 2022 -0400

        Add helper function to create alignment scoring matrix

        New scoring matrix can be created using CRISPResso2Align.make_matrix()

    commit c80f82838c5a228b79ad4484092877cfee08e02c
    Author: Cole Lyman <[email protected]>
    Date:   Mon Aug 22 18:28:33 2022 -0600

        Add `zip_output` (#240)

        * Making zip of results

        * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping

        * Adding --zip to compare and pooled/wgs compare

        * Add more formatting changes to CRISPRessoShared

        * Refactoring propagate_crispress_options so only one version exists

        * Zip added to arguments_to_ignore and warning added when changing arguments

        * Restore styling

        * Update README to include --zip

        * Rename --zip to --zip_output

        * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE

        * Bug fix arg to args

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 5de3d7286d8e33c7cf4d3615fce715806e72f511
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:42:34 2022 -0400

        Fix fix to aggregate for CRISPRessoWGS

    commit a2294c266f43b14969a5d6474076f31a77a57173
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:40:50 2022 -0400

        Fix bug in aggregate for WGS

    commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3
    Author: Kendell Clement <[email protected]>
    Date:   Mon Aug 8 21:53:45 2022 -0400

        Update CRISPRessoWGS to allow non-word characters in region names

    commit 040ac0033d6e250f4e3a412101874cf5e914e08a
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 16:04:59 2022 -0400

        Enable processing of cram files by CRISPRessoWGS

        Adds --reference to samtools view when viewing cram files

    commit cf112a0caba8789e28530cc09171285ec6ea9b4c
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 14:55:46 2022 -0400

        Auto amplicon detection for interleaved input

        Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed.

    commit 4ba524dc7b947feca8a0f743837844f9febc2171
    Author: Cole Lyman <[email protected]>
    Date:   Thu Aug 4 11:32:11 2022 -0600

        Potential fix for aggregate plots in Batch mode (#237)

    commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 21 22:45:48 2022 -0400

        Fix pct_vectors in crispresso2_info json object

    commit 65a079d86d6f386793397398f839c46014b54543
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:46:37 2022 -0400

        Fix more readme spelling bugs

    commit e817376ecd54cdea1f29e303ca25b9e7d1d38333
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:42:23 2022 -0400

        Fix bug in readme spelling

    commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 16:10:09 2022 -0400

        Fix loading of crispresso info from WGS and Pooled

    commit b68a43271115251b18e8955e285ccc18f549e8cd
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:11:04 2022 -0400

        Add plotly to dockerfile

    commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:10:00 2022 -0400

        Fix #231 Allow N's in bam output (Try 2)

    commit c460b3e73fd06a230dbac2e37c86b833144ebf94
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:09:10 2022 -0400

        Revert "Fix #231 Allow N's in bam output"

        This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3.

    commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 13:52:37 2022 -0400

        Fix #231 Allow N's in bam output

    commit 0a2419e518dc9b3520058c3927f98b31cd51347e
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:10:01 2022 -0600

        Fix bug when name is provided instead of amplicon_name in pooled input file (#229)

        Also, raise an exception (instead of incorrectly executing) when there are not
        enough matched parameters in the pooled input file.

    commit cb58212379803788c04ca5793baaa760cbbeaa81
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:09:49 2022 -0600

        Fix bug…
mbowcut2 added a commit to edilytics/CRISPResso2 that referenced this pull request Apr 15, 2024
commit a98d35a5d38298fa68a83aecaf80b1cb9023b5a5
Author: McKay <[email protected]>
Date:   Mon Apr 15 14:30:55 2024 -0600

    Squashed commit of the following:

    commit 06f48ed73228580cfedaab389a6db55a62456b97
    Author: McKay <[email protected]>
    Date:   Mon Apr 15 14:02:58 2024 -0600

        c2pro installation in dockerfile

    commit 38dd5150a785e65edf7308bed35001b49bbbf017
    Author: McKay <[email protected]>
    Date:   Mon Apr 15 13:20:46 2024 -0600

        removed &&

    commit ad38cee6ede27affaae6862173e912c999f32ce1
    Author: McKay <[email protected]>
    Date:   Mon Apr 15 12:12:12 2024 -0600

        moved d3 import to end

    commit c2ba22f35c3e0a7775b3b89405c3c5ca38b3c4ee
    Author: McKay <[email protected]>
    Date:   Mon Apr 15 12:11:03 2024 -0600

        removed duplicate imports
        leave d3 in bottom
        plotly at top

    commit 44cbe598affd5176d3209fd1901db0d0c438f625
    Author: McKay <[email protected]>
    Date:   Fri Apr 12 16:07:18 2024 -0600

        fixed batch rendering for d3

    commit 418a6205ce52d8eb45799b34bcb47e3e5372d726
    Author: McKay <[email protected]>
    Date:   Fri Apr 12 15:47:45 2024 -0600

        fixed d3 plot 2b rendering

    commit d4e00703ff6f023606505b787e5ddc31772d9e8c
    Author: McKay <[email protected]>
    Date:   Wed Apr 10 15:52:05 2024 -0600

        move C2Pro install before app run

    commit e8f1947a3e5e5106d74f9200a6dac93616b171bd
    Author: McKay <[email protected]>
    Date:   Tue Apr 9 17:21:40 2024 -0600

        fixed guardrails, paritals path

    commit 0a99c237b66970053a82221dba4b5aa6c2d2b568
    Author: McKay <[email protected]>
    Date:   Tue Apr 9 13:33:40 2024 -0600

        fastp, htmx, jquery dependencies

    commit 62e8e66f62ddc226a470e82ee761279fb957c71d
    Author: McKay <[email protected]>
    Date:   Tue Apr 9 12:15:49 2024 -0600

        removed commented code

    commit 11fb95e02f6133df827798098765b07043723a4f
    Author: McKay <[email protected]>
    Date:   Tue Apr 9 11:19:58 2024 -0600

        reports changes

    commit 3220cd8c6c031c1710c72c9cf0eb7d6070757bd3
    Author: McKay <[email protected]>
    Date:   Mon Apr 8 15:50:54 2024 -0600

        Squashed commit of the following:

        commit 53fecf71977f10c0d643887bf110cf7cbf044b8e
        Author: Sam <[email protected]>
        Date:   Thu Apr 4 15:35:39 2024 -0600

            Squashed commit of the following:

            commit 6f4b0ad885e1d72413a034bf7abaaa0360a3b0c4
            Author: Samuel Nichols <[email protected]>
            Date:   Thu Apr 4 15:18:09 2024 -0600

                Batch d3 clean (#55)

                * imports C2Pro plots if available

                * added --use_matplotlib flag

                * added C2Pro
                matched api funciton signatures

                * added api args for plotly

                * added **kwargs

                * renamed config to custom_config, more specificity

                * added backend flag for plotly kaleido

                * added pro_installed boolean for templates, added plotly dependency to report templates

                * Squashed commit of the following:

                commit c909ea3b34e87ce637e00dac075d2bb2f8bfb954
                Author: McKay <[email protected]>
                Date:   Thu Feb 15 15:55:23 2024 -0700

                    added plotly dependency for pro

                commit 76b3601f6a0144f100266153f1c999e0c5de65de
                Author: Samuel Nichols <[email protected]>
                Date:   Fri Jan 12 09:56:19 2024 -0700

                    Squashed commit of the following:

                    commit 603f2eff9d1aa21ae95f3e134da303b8018d3a33
                    Author: Samuel Nichols <[email protected]>
                    Date:   Fri Jan 12 09:48:20 2024 -0700

                        fix guardrials partial

                    commit 22fc03183a8070c30dfb74d5c23575ac19019855
                    Author: Samuel Nichols <[email protected]>
                    Date:   Fri Jan 12 08:54:01 2024 -0700

                        Add guardrail partial

                    commit e55f6b21972b578261bc5a864ce1d653d98f9e34
                    Author: Samuel Nichols <[email protected]>
                    Date:   Mon Jan 8 07:50:59 2024 -0700

                        Functional guardrails, needs reports update

                    commit 6e968e9699ed59a47d88191d03768e042d8b60a4
                    Merge: 32b49685 e948ce10
                    Author: Samuel Nichols <[email protected]>
                    Date:   Mon Dec 18 13:34:36 2023 -0700

                        Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history

                    commit 32b49685da320501dad2b0ebbb57887b66220ba8
                    Author: Samuel Nichols <[email protected]>
                    Date:   Fri Dec 15 15:27:04 2023 -0700

                        Include guardrail functions

                    commit 4e309cf6f732565d635de3d4c5d074ada3027e2d
                    Author: Cole Lyman <[email protected]>
                    Date:   Mon Dec 18 10:51:55 2023 -0700

                        Refactor to use CRISPRessoReports module

                    commit e648dc087c0055bc5d2fca13c64071a371dea941
                    Author: Cole Lyman <[email protected]>
                    Date:   Mon Dec 18 10:51:11 2023 -0700

                        Add CRISPRessoReports subtree

                    commit e948ce107ebb0d1d99010ed12e937f34b5e607d4
                    Author: Samuel Nichols <[email protected]>
                    Date:   Fri Dec 15 15:27:04 2023 -0700

                        Include guardrail functions

                    commit d33c748871a625facfe8d792e29c77ab9779138f
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Nov 7 16:31:06 2023 -0700

                        Include parameter --assign_ambiguous_alignments_to_first_reference in readme

                    commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed Oct 11 17:17:30 2023 -0600

                        Enable quantification by sgRNA (#348)

                        This PR includes:
                        - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs']
                        - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately.

                        I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command:

                        ```

                        CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table  -n hdr3 -p max -gn guide1,guide2
                        ```

                        ```
                        python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3
                        ```

                        This produces:
                        ```
                        Processed 25000 alleles
                        Reference: Reference (2391/23415 modified reads)
                                UNMODIFIED: 21024
                                MODIFIED guide1: 2359
                                MODIFIED guide2: 32
                        Reference: HDR (856/1577 modified reads)
                                UNMODIFIED: 721
                                MODIFIED guide1: 854
                                MODIFIED guide1 + guide2: 1
                                MODIFIED guide2: 1
                         ```

                    commit 2e3da02fdbed2fa8ae02a277763d65a502459827
                    Author: Cole Lyman <[email protected]>
                    Date:   Tue Oct 10 15:29:08 2023 -0600

                        changed tuple to list for matplotlib change (#31) (#346)

                        Co-authored-by: mbowcut2 <[email protected]>

                    commit cd3c332135fe4db0f9218e3d87263d5c65838ed9
                    Author: Kendell Clement <[email protected]>
                    Date:   Sun Oct 1 01:54:46 2023 -0600

                        rename script to camel case

                    commit 7c719d65fb36ac7654db9040f226564ea28fcab9
                    Author: Kendell Clement <[email protected]>
                    Date:   Sun Oct 1 01:53:44 2023 -0600

                        Add new script for counting high quality bases

                    commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Sep 14 15:15:30 2023 -0600

                        Prime editing alignment params (#336)

                        Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty.

                        CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences.

                        The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs.

                    commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu Sep 7 16:43:30 2023 -0600

                        Fix samtools piping (#325)

                        * Remove samtools pipe stderr to stdout

                        Sometimes some of the libraries that samtools depends on don't have the correct
                        version information, and as such samtools will report this to stderr when run.
                        Because we pipe the output of samtools, we expect it to be valid SAM format, but
                        when these library version messages are reported, it breaks CRISPRessoWGS.

                        * Remove extra spacing at end of lines and add missing comma in WGS

                        * Log stderr from samtools in CRISPRessoWGS

                    commit 8feff4101f27406d9d88ace97d31a518276bff3f
                    Author: Cole Lyman <[email protected]>
                    Date:   Fri Sep 1 09:43:56 2023 -0600

                        Replace link to CRISPResso schematic with raw URL in README (#329)

                        * Replace link to CRISPResso schematic with raw URL

                        * Add new lines to the beginning of unordered lists

                    commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Aug 10 00:52:12 2023 -0600

                        Try to unbreak CircleCI

                    commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Aug 10 00:17:27 2023 -0600

                        Center command line text messages

                    commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Aug 10 00:17:07 2023 -0600

                        Fix bug in prime-editing scaffold-incorporation plotting

                        If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read.

                    commit 2b36a1a5c35e8a93516ce8baf464595615e0f402
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed Aug 9 15:29:48 2023 -0600

                        CRISPRessoPooled --compile_postrun_references bug fixes

                    commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Aug 8 23:30:15 2023 -0600

                        Fix missing ' in Pooled --demultiplex_only_at_amplicons

                    commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664
                    Author: Cole Lyman <[email protected]>
                    Date:   Mon Jul 24 10:47:46 2023 -0600

                        Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316)

                        * Make sorting stable

                        * Including c files

                        * Sort by #Reads instead of %Reads to avoid floating point errors

                        ---------

                        Co-authored-by: Samuel Nichols <[email protected]>

                    commit de05533b3511a84f3b6b14fc2ef64db041613261
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu Jul 6 13:54:45 2023 -0600

                        Fix multiprocessing lambda pickling (#311)

                        * Fix running plots in parallel

                        The reason the plots were running slower before this change is because I was
                        calling the plot function, not passing it to `submit`. So it was essentially
                        running in serial, but worse because it was still spinning up/down the
                        processes.

                        * Fix multiprocessing lambda pickling (#20)

                        * Refactor process_futures to be a dict

                        This makes debugging much easier because you can associate the arguments to the
                        future with the results.

                        * Fix the pickling error when running in multiprocessing

                        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
                        pools, thus the lambdas are converted to a regular function.

                        * Further fixes to pickling multiprocessing error (#21)

                        * Refactor process_futures to be a dict

                        This makes debugging much easier because you can associate the arguments to the
                        future with the results.

                        * Fix the pickling error when running in multiprocessing

                        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
                        pools, thus the lambdas are converted to a regular function.

                        * Use Counter instead of defaultdict in CRISPRessoCORE

                        * Update process_futures to dict in Batch and Aggregate

                    commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Jul 3 17:12:09 2023 -0600

                        Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append

                    commit 7285da0e987b77b72c8885bb35940e0f50c146bd
                    Author: Kendell Clement <[email protected]>
                    Date:   Fri Jun 23 16:50:33 2023 -0600

                        Fix print bug for invalid fastq

                    commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c
                    Author: kclem <[email protected]>
                    Date:   Wed Jun 21 16:03:48 2023 -0600

                        Slugify before creating filename - replaces invalid characters in batch names with _

                    commit f97e29c67de4c80b8d6b9cf334f363be4b514ade
                    Author: Cole Lyman <[email protected]>
                    Date:   Wed Jun 21 14:43:43 2023 -0600

                        Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307)

                        * Add verbosity argument to CRISPRessoAggregate (#18)

                        * Allow for amplicon and guide seqs to be some variant of NA in batch (#19)

                        This was discovered when attempting to infer amplicon sequences in batch mode on
                        the web interface, NAs were supplied for the amplicon sequences to the sub
                        CRISPResso commands.

                    commit 32e1e9797da5c3033cdc588e92f06b8813961953
                    Author: Mark Clement <[email protected]>
                    Date:   Wed Jun 21 14:01:00 2023 -0600

                        Allow for interrogation of overlapping sgRNA sites

                    commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Jun 12 12:16:47 2023 -0600

                        Check input fastq file format

                        Asserts input format of fastq files - including if gzipped files are missing the gz suffix.

                    commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Jun 5 13:41:55 2023 -0600

                        Fix CRISPRessoArgParser

                    commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Jun 5 13:29:31 2023 -0600

                        Cosmetic updates for command-line use

                        - version bump to 2.2.13
                        - If no args are provided, the command line version will print out an abbreviated help message
                        - parameters can be excluded from CRISPRessoArgParser

                    commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu May 11 14:31:47 2023 -0600

                        Fix multiprocessing error, don't start pool when only using single thread (#302)

                        * Update README to have consistent use of `--base_editor_output` (#16)

                        * Add files via upload

                        * Only start process pools when using multiple processes

                        This is mainly to solve the issue when running on AWS Lambda, but this should
                        improve single core performance overall.

                        ---------

                        Co-authored-by: Kendell Clement <[email protected]>

                    commit 92a705c939b370373a70cf6ae9f1616de33288b9
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu May 11 14:31:06 2023 -0600

                        Update `base_editor` parameters in README and add Plot Harness (#301)

                        * Update README to have consistent use of `--base_editor_output` (#16)

                        * Add files via upload

                        ---------

                        Co-authored-by: Kendell Clement <[email protected]>

                    commit 7d46c4490235df45c5546b1b470e4e6a99727031
                    Author: Cole Lyman <[email protected]>
                    Date:   Wed May 10 15:41:33 2023 -0600

                        Clarify CRISPRessoWGS intended use (#303)

                        * Update README to have consistent use of `--base_editor_output` (#16)

                        * Add sample plotting jupyter notebook

                        * Add clarifying info to CRISPRessoWGS description

                        Clarify WGS usage

                    commit 833a701787bb47674b3e921c38cac6189c775cf7
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu May 4 17:02:46 2023 -0400

                        Remove debug print statements

                    commit 712eb2a11825e8d36f2870deb12b35486bd633fb
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu May 4 16:40:07 2023 -0400

                        Allow dashes in filenames resolve #73

                    commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5
                    Author: Kendell Clement <[email protected]>
                    Date:   Sat Apr 22 23:41:58 2023 -0400

                        Raise exceptions from within futures in plot_pool

                    commit 7e807a60de2a9d18bccd034b87106ceaf7153338
                    Author: Kendell Clement <[email protected]>
                    Date:   Sat Apr 22 23:38:56 2023 -0400

                        Fix future pandas indexing warning

                        Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead"

                    commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu Apr 20 13:59:27 2023 -0600

                        Remove debug print statements fixes #295 (#297)

                        The format string option used here is only available in Python version >=3.8.

                    commit 478c06f784603e96d20f96e91993fdcc4ac35c8a
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Apr 13 12:09:26 2023 -0400

                        Update plotCustomAllelePlot.py script for #292 (#293)

                        Update type of 'max_rows' param to int
                        Fix location of 'args' in crispresso2_info object

                    commit bcdae39e05d530f4a4e78738c3b30f7664981919
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Mar 27 13:18:34 2023 -0400

                        Update pooled parameter format

                    commit 546446e36e7e68b527767d6c31ec341a49df2059
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Feb 14 16:26:23 2023 -0500

                        Fix running plots in parallel (#286)

                        The reason the plots were running slower before this change is because I was
                        calling the plot function, not passing it to `submit`. So it was essentially
                        running in serial, but worse because it was still spinning up/down the
                        processes.

                        Co-authored-by: Cole Lyman <[email protected]>

                    commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1
                    Author: kclem <[email protected]>
                    Date:   Fri Feb 10 15:45:15 2023 -0500

                        Fix #283 to avoid filename collisions

                        Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29

                    commit e577318006cd17b2725bd028e5e56634c6eb829a
                    Author: kclem <[email protected]>
                    Date:   Mon Feb 6 16:37:25 2023 -0500

                        Case-insensitive headers accepted in CRISPRessoPooled

                    commit d34927620a4a6126a9988b3041e76f60728abbfe
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Jan 31 13:48:33 2023 -0500

                        Fix print statement in CORE

                    commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Jan 31 13:22:51 2023 -0500

                        Version bump to 2.2.12

                    commit 1d4679c72d0c8b4154317c9aff5179217198e2d7
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Jan 31 13:01:31 2023 -0500

                        Status Updates + Pooled Mixed Mode Update (#279)

                        * Implement logging handler to overwrite the latest log status to file

                        * Add StatusHandler to CRISPRessoCORE log

                        This will take the latest log output and write it to a file (`status.txt`), the
                        catch being that with each log the file is overwritten so that one can easily
                        tell where CRISPResso currently is and what the error is (if any). These changes
                        include some slight refactoring in order to accomodate any potential parameter
                        exceptions.

                        * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn`

                        * Add StatusHandler to CRISPRessoPooled and a little refactoring

                        * Implement `percent_complete` to the status log

                        * Add StatusHandler to CRISPRessoAggregate log

                        * Add StatusHandler to CRISPRessoCompare log

                        * Add StatusHandler to CRISPRessoPooledWGSCompare log

                        * Add StatusHandler to CRISPRessoWGS log

                        * Rename `status.txt` to `CRISPResso_status.txt`

                        * Modify status log names to match the tool they are generated from

                        * Add percent_complete stages to CRISPRessoCORE

                        These also include log statements of each plot that is being generated as well
                        as fixing some variable name collisions with `ind`.

                        * Format the percentage in the log to be 2 decimal places

                        * Change all plotting logs from `info` to `debug` and simplify progress

                        This refactors how the progress of the plots is calculated, making it much
                        simplier. Before this change we would of had to keep track of the number of
                        times `percent_complete` was output, but now it simply updates the percent
                        complete after each amplicon is finished processing. Hopefully this will make
                        things easier to mantain even though it will be a little less "accurate" (not
                        sure how accurate the original implementation was...).

                        * Implemented shared console log handler across all CRISPResso* calls

                        This allows for easy changes to logging formatting, which was inspired by having
                        to change the default logging level. The default logging level needs to be set
                        at `logging.DEBUG` in order for the debug log statements to not be ignored for
                        the running and status logs.

                        * Add ability to set the verbosity level to each CRISPResso* tool

                        This allows users to set a verbosity level between 1 and 4 using the
                        `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the
                        level will default to 4, being the most verbose.

                        * Implement showing the last seen `percent_compelte` when none is provided

                        * Keep track of and log when multiple parallel runs are completed

                        These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that
                        we can now display when a run is completed. This potentially breaks how
                        signals and interupts are handled with multiple runs happening, but this needs
                        to be reviewed.

                        * Add debug and percentage complete to CRISPRessoBatch

                        * Add percent complete to CRISPRessoPooled

                        * Add debug and percent_complete message to CRISPRessoAggregate

                        * Add `percent_complete` to CRISPRessoCompare

                        * Add `percent_complete` to CRISPRessoPooledWGSCompare

                        * Add status and `percent_complete` to CRISPRessoMeta

                        * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare

                        * Fixing documentation to match pooled headers

                        * Header removal bug fix change documentation to guide_seq

                        * Update documentation and help feature for CRISPRessoPooled

                        * Remove extra newlines from CRISPRessoPooled -h

                        * Make variable names as clear as my firstborn child's name

                        * Update one more variable name

                        * Fix bug to flow CRISPRessoPooled options to sub command

                        * Make amplicon file args variable name clear

                        * Update how parameters are set and retrieved from parameter object

                        The refactor in the previous commit changed the type of the arguments to a
                        dictionary which doesn't have the parameters as attributes, and this commit
                        fixes that error.

                        * Add note in output header for change in default CRISPRessoPooled

                        In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the
                        default when running in mixed-mode. This is to allow for inexact alignments of
                        the reads and the amplicons to the genome. For more context, see this issue
                        https://github.com/pinellolab/CRISPResso2/issues/276

                        * Clarify the verbosity parameter help message

                        * Separate out parameters to `normalize_name` in CRISPRessoCORE

                        * Separate out parameters to `normalize_name` in CRISPRessoWGS

                        * Separate out parameters to `normalize_name` in CRISPRessoPooled

                        * Separate out parameters to `normalize_name` in CRISPRessoCompare

                        * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name`

                        * Refactor `run_crispresso_cmds` to not require a `logger`

                        This commit implements the functionality to make the `logger` object optional by
                        seeing which module called the `run_crispresso_cmds` function and obtaining the
                        correct object from that module name.

                        The function also immediately returns when no commands are passed to it.

                        * Add amplicon name to plotting debug statements in CRISPRessoCORE

                        ---------

                        Co-authored-by: Cole Lyman <[email protected]>
                        Co-authored-by: Cole Lyman <[email protected]>
                        Co-authored-by: Cole Lyman <[email protected]>
                        Co-authored-by: Samuel Nichols <[email protected]>

                    commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Jan 26 15:27:27 2023 -0500

                        CRISPRessoPooled custom header fix (#278)

                        * Fixing documentation to match pooled headers

                        * Header removal bug fix change documentation to guide_seq

                        * Update documentation and help feature for CRISPRessoPooled

                        * Remove extra newlines from CRISPRessoPooled -h

                        * Make variable names as clear as my firstborn child's name

                        * Update one more variable name

                        Co-authored-by: Samuel Nichols <[email protected]>

                    commit 104866e1080c973bb025d1a5ba59b19dca1658af
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu Jan 5 14:00:26 2023 -0700

                        Fix deprecated numpy type names (fixes #269) (#270)

                        In the most recent version of numpy (1.24) some of the types have been
                        deprecated. This commit fixes these errors.

                    commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu Jan 5 06:49:35 2023 -0700

                        Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274)

                        I have suffered enough trying to debug my installation, so hopefully this helps
                        someone else.

                        Co-authored-by: Cole Lyman <[email protected]>

                    commit b9851e98104602eb78c2b384105267624295e9d3
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu Dec 22 13:30:23 2022 -0700

                        Fix bug when pooled bam is input (#265)

                        This change checks to see if a bam file was input, and if so it doesn't try to
                        remove any intermediate files because there aren't any.

                        Co-authored-by: Cole Lyman <[email protected]>

                    commit b822612642043e75a19042941f69b457ce51f517
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Dec 19 15:26:45 2022 -0500

                        Delete vscode settings

                    commit b99aa624dec68ef7d19264340ce0cafa829625f4
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Dec 19 13:29:14 2022 -0500

                        Clarify input param help for pooled bam

                    commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Dec 19 13:28:54 2022 -0500

                        Fix #235 - Cigar string is * if read unaligned

                        Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235.

                    commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu Dec 8 13:48:17 2022 -0700

                        Add deprecation notice (#260)

                        * Add FLASh and Trimmomatic deprecation notice to CLI output

                        * Add Edilytics email address to CLI output

                    commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Dec 6 12:16:19 2022 -0500

                        Format filterReadsOnSequencePresence script

                    commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e
                    Author: Kendell Clement <[email protected]>
                    Date:   Fri Dec 2 22:12:54 2022 -0500

                        Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string

                    commit 9ddea40f7f02b546941ddaa4c71fc5283075051a
                    Author: kclem <[email protected]>
                    Date:   Mon Nov 14 10:33:04 2022 -0500

                        Add check for prime editing extension sequence in prime edited sequence

                        if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence.

                    commit 152f2dd5001da7090641ee8a1326bde9f7e8104e
                    Author: kclem <[email protected]>
                    Date:   Wed Nov 9 11:53:41 2022 -0500

                        Version bump to 2.2.11a

                    commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d
                    Author: kclem <[email protected]>
                    Date:   Wed Nov 9 11:47:30 2022 -0500

                        Add param to override prime editing sequence checks

                        CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.'

                    commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b
                    Author: kclem <[email protected]>
                    Date:   Wed Nov 9 10:06:51 2022 -0500

                        Update filterReadsOnSequencePresence.py

                    commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Nov 7 22:25:16 2022 -0500

                        Add script to filter input based on sequence presence

                    commit 713e57a19c35180035ca35e11a5820065eda0198
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Oct 18 16:02:26 2022 -0400

                        Allow spaces in read names for CRISPRessoWGS

                    commit 39ce008bdddccdd8229c0ba185dce78bc2f66968
                    Author: Cole Lyman <[email protected]>
                    Date:   Sat Oct 8 21:09:58 2022 -0600

                        Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250)

                    commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5
                    Author: Kendell Clement <[email protected]>
                    Date:   Sat Oct 8 23:08:47 2022 -0400

                        Batch amplicon plots (#251)

                        * Error out if HDR amplicon matches existing amplicon

                        * Add check for amplicon sequence uniqueness

                        * Fix bug with bam_input not having bam_output

                        * Test for no returned lines in auto mode, version bump to 2.2.11

                        * Fix pandas deprecation of df.append

                    commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Oct 6 16:32:02 2022 -0400

                        Fix CRISPRessoBatch plot pool bug when plots are suppressed

                    commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2
                    Author: Cole Lyman <[email protected]>
                    Date:   Wed Sep 21 21:04:51 2022 -0600

                        Fix batch quilt plot name (#249)

                        This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch.

                    commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Sep 15 15:49:08 2022 -0400

                        Version bump to 2.2.10

                    commit c5f79aebfc1ae209f4ee320df250eed89a02787c
                    Author: Cole Lyman <[email protected]>
                    Date:   Wed Sep 14 14:24:55 2022 -0600

                        Parallel plot refactor (#247)

                        * Fix duplicate plotting in CRISPRessoBatch aggregate

                        * Refactor mulltiprocessing plots in CRISPRessoBatch

                        * Refactor multiprocessing plots in CRISPRessoCORE

                        * Refactor multiprocessing plots for CRISPRessoAggregate

                    commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Sep 13 14:12:11 2022 -0400

                        print files in curr dir if Aggregate can't find files

                    commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Sep 12 10:32:57 2022 -0400

                        Spelling typo

                    commit c15f01c75083403f17c58c121b2afe97e9f2a1ec
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue Sep 6 17:49:52 2022 -0400

                        Add helper function to create alignment scoring matrix

                        New scoring matrix can be created using CRISPResso2Align.make_matrix()

                    commit c80f82838c5a228b79ad4484092877cfee08e02c
                    Author: Cole Lyman <[email protected]>
                    Date:   Mon Aug 22 18:28:33 2022 -0600

                        Add `zip_output` (#240)

                        * Making zip of results

                        * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping

                        * Adding --zip to compare and pooled/wgs compare

                        * Add more formatting changes to CRISPRessoShared

                        * Refactoring propagate_crispress_options so only one version exists

                        * Zip added to arguments_to_ignore and warning added when changing arguments

                        * Restore styling

                        * Update README to include --zip

                        * Rename --zip to --zip_output

                        * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE

                        * Bug fix arg to args

                        Co-authored-by: Samuel Nichols <[email protected]>

                    commit 5de3d7286d8e33c7cf4d3615fce715806e72f511
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Aug 11 21:42:34 2022 -0400

                        Fix fix to aggregate for CRISPRessoWGS

                    commit a2294c266f43b14969a5d6474076f31a77a57173
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Aug 11 21:40:50 2022 -0400

                        Fix bug in aggregate for WGS

                    commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon Aug 8 21:53:45 2022 -0400

                        Update CRISPRessoWGS to allow non-word characters in region names

                    commit 040ac0033d6e250f4e3a412101874cf5e914e08a
                    Author: kclem <[email protected]>
                    Date:   Mon Aug 8 16:04:59 2022 -0400

                        Enable processing of cram files by CRISPRessoWGS

                        Adds --reference to samtools view when viewing cram files

                    commit cf112a0caba8789e28530cc09171285ec6ea9b4c
                    Author: kclem <[email protected]>
                    Date:   Mon Aug 8 14:55:46 2022 -0400

                        Auto amplicon detection for interleaved input

                        Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed.

                    commit 4ba524dc7b947feca8a0f743837844f9febc2171
                    Author: Cole Lyman <[email protected]>
                    Date:   Thu Aug 4 11:32:11 2022 -0600

                        Potential fix for aggregate plots in Batch mode (#237)

                    commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Jul 21 22:45:48 2022 -0400

                        Fix pct_vectors in crispresso2_info json object

                    commit 65a079d86d6f386793397398f839c46014b54543
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed Jul 20 23:46:37 2022 -0400

                        Fix more readme spelling bugs

                    commit e817376ecd54cdea1f29e303ca25b9e7d1d38333
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed Jul 20 23:42:23 2022 -0400

                        Fix bug in readme spelling

                    commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed Jul 20 16:10:09 2022 -0400

                        Fix loading of crispresso info from WGS and Pooled

                    commit b68a43271115251b18e8955e285ccc18f549e8cd
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Jul 14 14:11:04 2022 -0400

                        Add plotly to dockerfile

                    commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Jul 14 14:10:00 2022 -0400

                        Fix #231 Allow N's in bam output (Try 2)

                    commit c460b3e73fd06a230dbac2e37c86b833144ebf94
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Jul 14 14:09:10 2022 -0400

                        Revert "Fix #231 Allow N's in bam output"

                        This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3.

                    commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Jul 14 13:52:37 2022 -0400

                        Fix #231 Allow N's in bam output

                    commit 0a2419e518dc9b3520058c3927f98b31cd51347e
                    Author: Cole Lyman <[email protected]>
                    Date:   Fri Jul 8 21:10:01 2022 -0600

                        Fix bug when name is provided instead of amplicon_name in pooled input file (#229)

                        Also, raise an exception (instead of incorrectly executing) when there are not
                        enough matched parameters in the pooled input file.

                    commit cb58212379803788c04ca5793baaa760cbbeaa81
                    Author: Cole Lyman <[email protected]>
                    Date:   Fri Jul 8 21:09:49 2022 -0600

                        Fix bug when comparing two samples with the same name. (#228)

                    commit e8a796f5f451409cbafed4404dfba4b6b8a124ca
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Jun 23 21:30:23 2022 -0400

                        Version bump to 2.2.9

                    commit 632143ddedea48bab9229baeb4bf3ea4d1f658d6
                    Author: Cole Lyman <[email protected]>
                    Date:   Mon Jun 20 19:53:14 2022 -0600

                        Don't run global frameshift plot when there are no reads (#226)

                        When there are no reads (i.e. global_MODIFIED_FRAMESHIFT +
                        global_MODIFIED_NON_FRAMESHIFT + global_NON_MODIFIED_NON_FRAMESHIFT == 0) there
                        was a bug when trying to compute the pie chart, because all of the values in the
                        pie chart are 0. This fix, will make sure that there is at least one read in
                        order for the plot to bee constructed properly.

                    commit 4bb06218e835d2624d53fd401542caef6f8a3a55
                    Author: kclem <[email protected]>
                    Date:   Fri Jun 3 16:57:02 2022 -0400

                        Improvements for guide inference in 'auto' mode

                        In 'auto' mode, a putative guide sequence is selected at the site of maximal editing.  If the site of maximal editing happens near the end of the guide (e.g. base 0) many things will break (e.g. quantification windows, etc). This update excludes bases from being used to find the guide using the --exclude_bp_from_left and --exclude_bp_from_right parameters. At default, these parameters are 15bp, so the first and last 15bp would not be selected for the site of maximal editing and thus be the site of a guide sequence. In addition, the site of maximal editing must have 3x the magnitude over the background.

                    commit 9d64de187835b2553ad2b4374d32edab27f83645
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu Jun 2 20:22:25 2022 -0400

                        Update README.md

                    commit 6aafc5387986f5089ba55b68d128343d68052792
                    Author: Simon P Shen <[email protected]>
                    Date:   Tue May 31 17:42:53 2022 -0400

                        directory in quotes in batch cmd (#222)

                        Add quotes around output folder for folders that have spaces.

                    commit 432f163ac68b9a650d1fd326171aadc505ee87f4
                    Author: Kendell Clement <[email protected]>
                    Date:   Tue May 24 23:38:36 2022 -0400

                        CRISPRessoBatch fills NA values in batch settings

                        NA values in CRISPRessoBatch are filled with the value from args - either the default value or the value from the command line args (if set)

                    commit 6de774adbad3aa8cd99d07b0ba7692984b356cd4
                    Author: kclem <[email protected]>
                    Date:   Mon May 23 14:18:02 2022 -0400

                        Fix file naming bug for HDR outputs

                        In html file, figures 4e and 4f incorrectly referenced figure 4d. This fixes this bug.

                    commit b88fec0668a4082a12ead3d26582e86d829dd7cc
                    Author: Kendell Clement <[email protected]>
                    Date:   Sat May 21 00:32:15 2022 -0400

                        For bam_output, fix bug that wrote unaligned lines twice

                    commit 3564e77ebcdedb4b01cc01dcca18ba3221fac67c
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu May 19 16:32:18 2022 -0400

                        Update README with CRISPRessoPooled headers and bam_output parameters

                    commit bc08d81f17cb1929d1c37a1773cffcf36fb12fe2
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu May 19 16:11:30 2022 -0400

                        Add more links to tools

                    commit 006c497a379ecd94b017a883a5db887861e1586a
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu May 19 16:08:14 2022 -0400

                        Add links to tools

                    commit dc8243373ad00d6bd467fc30c59942596ff0c5d6
                    Author: Kendell Clement <[email protected]>
                    Date:   Mon May 16 21:38:06 2022 -0400

                        fastq_to_bam implementation (#219)

                    commit e88b6833977c6b2768299e0b2e7af623e3a9ae7c
                    Author: Kendell Clement <[email protected]>
                    Date:   Sun May 8 02:14:13 2022 -0400

                        Fix bug for when guides don't agree in CRISPRessoAggregate

                    commit 7eb763116a8c60603f1cd654645215767ee8eb52
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu May 5 03:28:21 2022 -0400

                        Fix bug for case of empty summary plots in report generation

                    commit 0324fa67d14ed945f0c9531d9bcf73ebcf4ca042
                    Author: Kendell Clement <[email protected]>
                    Date:   Thu May 5 03:28:02 2022 -0400

                        Create report for number of significant bases in CRISPRessoCompare

                    commit e3c9d0026a9ee6732f3ed6bdcf2a824850d7e66a
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 22:43:11 2022 -0400

                        Update pickle to json in readme and CRISPRessoPooledWGSCompare

                    commit 1553f7977c12bf1091a20ca55b878bccfb739b61
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 18:10:04 2022 -0400

                        Merge pull request #4 from pinellolab/master (#218)

                    commit bcecbfc047d294e26f381a6668e08cb4db24445c
                    Merge: 15b0e05b bb13e007
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 18:06:37 2022 -0400

                        Merge branch 'master' into master

                    commit bb13e007738d6e7a4909e01f03daff592f334f36
                    Merge: af4ab6e8 d0b41483
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 17:59:32 2022 -0400

                        Merge branch 'master' of https://github.com/edilytics/CRISPResso2

                    commit 15b0e05b9e03bbec5236e58776ddf9aa2f93180e
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 17:54:52 2022 -0400

                        2 flexible pooled input (#217)

                        * Batch type coerce and r2 file check

                        * Upgrade tabs for bootstrap5

                        * Update readme with additional pooled amplicon file headers

                        Co-authored-by: Samuel Nichols <[email protected]>

                    commit d0b41483bee704940ba60c58289f412b04c71659
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 13:43:43 2022 -0400

                        Update README.md

                    commit ce49fab5301cb73ba0daf6c765e350eb083c76f1
                    Merge: 5f909713 b913fcb4
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 13:40:30 2022 -0400

                        Merge pull request #3 from edilytics/2-flexible-pooled-input

                        Add flexibility to CRISPRessoPooled amplicon input by allowing headers. Also, prime editing and quantification window coordinate parameters can be passed to CRISPRessoPooled.

                    commit b913fcb402a8ba3106c3ff7913563a33d8d19fca
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 13:38:25 2022 -0400

                        Update CRISPRessoPooledCORE.py

                        Replace process to read header, increase flexibility for column order

                    commit 945bf31f16530b7ce25b89095b2c7005bf146117
                    Merge: 7b8f6788 5f909713
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 12:45:24 2022 -0400

                        Merge branch 'master' into 2-flexible-pooled-input

                    commit 5f9097133765736a7c2fe3c8e9b730845fed0b70
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 12:23:44 2022 -0400

                        Version bump to 2.2.8

                    commit c4a94ce0e06c6ebae13e128fbe6b708e635121c4
                    Author: Kendell Clement <[email protected]>
                    Date:   Wed May 4 00:13:17 2022 -0400

                        Fix summary plot representation for multi reports

                        *fixed old reference to make_multi_report which called old summary plot format
                        * renamed summary_plot to summary_plots to reflect a dict with multiple plots

                    commit 62900e9ae6fa37ce99a04f12a63ed5c912f75042
                    Author: Cole Lyman <[email protected]>
                    Date:   Tue May 3 20:47:52 2022 -0600

                        Large aggregation (#192)

                        * Squashed commit of the following:

                        commit 8564eb03f0d9e62abf4b7528baf5c2ae296be8f9
                        Merge: f6ef62c 07cc7d8
                        Author: Kendell Clement <[email protected]>
                        Date:   Tue Jan 11 16:20:15 2022 -0500

                            Merge branch 'indel-alignment-fix' of https://github.com/edilytics/CRISPResso2 into indel-alignment-fix

                        commit 07cc7d856ab3fcbbaa5381f17f29568192388887
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 15:29:59 2021 -0700

                            Fix bug in `find_indels_substitutions`

                            This bug occurred when there was a deletion at the end of a sequence, and was
                            thus not properly accounted for.

                        commit f6ef62cfdf909adac1b10ea86555cd218f8b2a74
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 15:29:59 2021 -0700

                            Fix bug in `find_indels_substitutions`

                            This bug occurred when there was a deletion at the end of a sequence, and was
                            thus not properly accounted for.

                        commit 7212f87f4be60057a6c848947ff6b5efde132a25
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 15:26:17 2021 -0700

                            Add a unit test for `find_indels_substitutions`

                            This unit test checks for deletions at the end of a sequence, which are
                            inherently outside of the include_indx_set window.

                        commit d50b4e903b973c71a275e31d470b40e59280ee13
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 15:03:22 2021 -0700

                            Fix a bug in `find_indels_substitutions`

                            The bug that this commit fixes is when an insertion occurs at the edge of the
                            include indexes. The trouble with this earlier was that it was using the `idx`
                            to calculate the size of the insertion, but the `idx` wasn't being incremented
                            anymore because it was outside of the include window.

                        commit 4db066f7bc333b7662a9232ac732ebb33ac3ace8
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 15:01:39 2021 -0700

                            Add test case for `find_indels_substitutions`

                            This test case is extracted from the CRISPRessoBatch integration test and
                            provides an example where there is an insertion at the edge of the include
                            index.

                        commit 3b3a7417f5bbd6c2785a2af54a47e01d2e820451
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 11:37:07 2021 -0700

                            Fix bug in CRISPRessoCompare where sample names were not properly set

                            This was a place where it was (partially) missed during the crispresso2_info
                            object refactoring.

                        commit e9f5eff3d95b676b5ee2e23371a5604f600d34b2
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 15:26:17 2021 -0700

                            Add a unit test for `find_indels_substitutions`

                            This unit test checks for deletions at the end of a sequence, which are
                            inherently outside of the include_indx_set window.

                        commit d4d45a918254ab19a7e7956e9e731389c6f36ecb
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 15:03:22 2021 -0700

                            Fix a bug in `find_indels_substitutions`

                            The bug that this commit fixes is when an insertion occurs at the edge of the
                            include indexes. The trouble with this earlier was that it was using the `idx`
                            to calculate the size of the insertion, but the `idx` wasn't being incremented
                            anymore because it was outside of the include window.

                        commit 13f00bb40239c83e6e5cf844561fdb7000d3d9ab
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 15:01:39 2021 -0700

                            Add test case for `find_indels_substitutions`

                            This test case is extracted from the CRISPRessoBatch integration test and
                            provides an example where there is an insertion at the edge of the include
                            index.

                        commit 659ae34e8fd106f7ecc163b5bea0b5a80ab0283c
                        Author: Cole Lyman <[email protected]>
                        Date:   Fri Dec 10 11:37:07 2021 -0700

                            Fix bug in CRISPRessoCompare where sample names were not properly set

                            This was a place where it was (partially) missed during the crispresso2_info
                            object refactoring.

                        * Add parameter `--suppress_batch_summary_plots`

                        If many runs are run at the same time, batch summary plots may fail because they are too large for matplotlib. This parameter `--suppress_batch_summary_plots` allows individual runs to be plotted, but suppresses batch summary plots that may otherwise be too big.

                        * Pep formatting cleanup

                        * Add summary nucleotide plots to aggregate

                        * Aggregate plots are paginated

                        * Update CRISPRessoAggregateCORE.py

                        Remove max sample limit for plotting

                        * Add --max_samples_per_summary_plot to CRISPRessoAggregate

                        Parameterize the max number of samples to plot on each page of reports. Additional PDFs will be created with this number of samples on them.

                        * Add plotly function to plot an interactive heatmap

                        * Fix deprecated numpy type to suppress warning

                        * Add plotting of heatmaps to CRISPRessoAggregateCORE to summarize modification types

                        These heatmaps are interactive (zoomable and panable) and show for each sample
                        the percentage of insertions, substitutions, and deletions.

                        * Add the heatmap summaries to the CRISPRessoAggregate report

                        * Update Bootstrap to 5.1.3

                        This is mainly so that we can use the fullscreen modal functionality in this version.

                        * Move the plotly heatmaps to a Bootstrap modal

                        * Fix bug where plots were not filling up entire modal.

                      …
mbowcut2 added a commit to edilytics/CRISPResso2 that referenced this pull request Jun 19, 2024
* Change CRISPResso_status.txt format to JSON (#46)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* add json read for status file

* changed Formatter to json format

* fixed json access variable name: message

* changed  perentage_complete to numeric

* changed status file to .json

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* New makefile commands

* changed file to .json

* changed status to json file

* Make JSON human readable by adding new lines

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* point to test branch

* pointed CI config to testing branch

* Update integration_tests.yml

point to master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: Samuel Nichols <[email protected]>

* Trevor/fastp integration (#50)

* Update check_program to check versions and create check_fastq function

* Update fastq arg, implement fastp in get_most_frequent_reads

* Bump version to 2.3.0

* Deprecate Flash and Trimmomatic parameters, and update fastp params

* Update guess_amplicons and guess_guides to remove max_paired_end_reads_overlap

* Implement trimming of single end reads

* Merge (and trim) reads in CRISPRessoCORE with fastp

* Modify error handling to account for fastp errors

* Replace flash and trimmomatic with fastp in Docker dependencies

* Update LICENSE.txt with fastp info

* Remove min and max amplicon length (no longer needed)

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Implement trimming with fastp in CRISPRessoPooled

* Implemend merging (and trimming) with fastp in CRISPRessoPooled

* Fixed minor fastp errors

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* Update where the test point to

* Fix 'Prime-edited' key not found (#32)

* Move 'Prime-edited' amplicon name check

By moving this, it will check if there is an amplicon named
'Prime-edited' (which is a reserved name) even if the
`prime_editing_pegRNA_extension_seq` parameter is empty.

* Only search for scaffold integration when pegRNA extension seq is provided

* Remove spaces at the end of lines

* Docker size (#49)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* 3.4->2.08

* Put ttf-mscorefonts-installer back above apt-get clean

* restore slash, replace fastp with trimmomatic and flash, add autoremove step

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* initial readme modifications

* Updated readme to remove deprecated commands, updated help text to reflect new version and fastp

* Pointing test branch back at master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Samuel Nichols <[email protected]>

* Guardrails clean history (#34)

* Include guardrail functions

* Add CRISPRessoReports subtree

* Refactor to use CRISPRessoReports module

* Include guardrail functions

* Functional guardrails, needs reports update

* Add guardrail partial

* fix guardrials partial

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Update C cythonized files

* Add exact numbers to guardrails printouts

* Remove extraneous whitespace from CRISPRessoCOREResources.pyx

* Fix calculation of `total_mods` from being negative

The issue was that `all_deletion_coordinates` just tells you how many deletions
were present, but not how long the deletion is.

* Changes to message

* Remove old tag

* Point tests at guardrails

* Restore C2 pro check

* Save message with guardrail name

* Point tests repo at master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>

* Fix case sensitivity in Prime Editing mode (#54)

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* Make all amplicons in amplicon_seq_arr uppercase

This fixes https://github.com/pinellolab/CRISPResso2/issues/396

* Allow RNA values to be provided for prime_editing_pegRNA_scaffold_seq

* Fix 'Prime-edited' key not found (#32)

* Move 'Prime-edited' amplicon name check

By moving this, it will check if there is an amplicon named
'Prime-edited' (which is a reserved name) even if the
`prime_editing_pegRNA_extension_seq` parameter is empty.

* Only search for scaffold integration when pegRNA extension seq is provided

* Remove spaces at the end of lines

* Docker size (#49)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* 3.4->2.08

* Put ttf-mscorefonts-installer back above apt-get clean

* restore slash, replace fastp with trimmomatic and flash, add autoremove step

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Guardrails clean history (#34)

* Include guardrail functions

* Add CRISPRessoReports subtree

* Refactor to use CRISPRessoReports module

* Include guardrail functions

* Functional guardrails, needs reports update

* Add guardrail partial

* fix guardrials partial

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Update C cythonized files

* Add exact numbers to guardrails printouts

* Remove extraneous whitespace from CRISPRessoCOREResources.pyx

* Fix calculation of `total_mods` from being negative

The issue was that `all_deletion_coordinates` just tells you how many deletions
were present, but not how long the deletion is.

* Changes to message

* Remove old tag

* Point tests at guardrails

* Restore C2 pro check

* Save message with guardrail name

* Point tests repo at master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>

---------

Co-authored-by: Samuel Nichols <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: trevormartinj7 <[email protected]>

* Batch d3 clean (#55)

* imports C2Pro plots if available

* added --use_matplotlib flag

* added C2Pro
matched api funciton signatures

* added api args for plotly

* added **kwargs

* renamed config to custom_config, more specificity

* added backend flag for plotly kaleido

* added pro_installed boolean for templates, added plotly dependency to report templates

* Squashed commit of the following:

commit c909ea3b34e87ce637e00dac075d2bb2f8bfb954
Author: McKay <[email protected]>
Date:   Thu Feb 15 15:55:23 2024 -0700

    added plotly dependency for pro

commit 76b3601f6a0144f100266153f1c999e0c5de65de
Author: Samuel Nichols <[email protected]>
Date:   Fri Jan 12 09:56:19 2024 -0700

    Squashed commit of the following:

    commit 603f2eff9d1aa21ae95f3e134da303b8018d3a33
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 09:48:20 2024 -0700

        fix guardrials partial

    commit 22fc03183a8070c30dfb74d5c23575ac19019855
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 08:54:01 2024 -0700

        Add guardrail partial

    commit e55f6b21972b578261bc5a864ce1d653d98f9e34
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Jan 8 07:50:59 2024 -0700

        Functional guardrails, needs reports update

    commit 6e968e9699ed59a47d88191d03768e042d8b60a4
    Merge: 32b49685 e948ce10
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Dec 18 13:34:36 2023 -0700

        Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history

    commit 32b49685da320501dad2b0ebbb57887b66220ba8
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit 4e309cf6f732565d635de3d4c5d074ada3027e2d
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:55 2023 -0700

        Refactor to use CRISPRessoReports module

    commit e648dc087c0055bc5d2fca13c64071a371dea941
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:11 2023 -0700

        Add CRISPRessoReports subtree

    commit e948ce107ebb0d1d99010ed12e937f34b5e607d4
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit d33c748871a625facfe8d792e29c77ab9779138f
    Author: Kendell Clement <[email protected]>
    Date:   Tue Nov 7 16:31:06 2023 -0700

        Include parameter --assign_ambiguous_alignments_to_first_reference in readme

    commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc
    Author: Kendell Clement <[email protected]>
    Date:   Wed Oct 11 17:17:30 2023 -0600

        Enable quantification by sgRNA (#348)

        This PR includes:
        - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs']
        - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately.

        I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command:

        ```

        CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table  -n hdr3 -p max -gn guide1,guide2
        ```

        ```
        python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3
        ```

        This produces:
        ```
        Processed 25000 alleles
        Reference: Reference (2391/23415 modified reads)
                UNMODIFIED: 21024
                MODIFIED guide1: 2359
                MODIFIED guide2: 32
        Reference: HDR (856/1577 modified reads)
                UNMODIFIED: 721
                MODIFIED guide1: 854
                MODIFIED guide1 + guide2: 1
                MODIFIED guide2: 1
         ```

    commit 2e3da02fdbed2fa8ae02a277763d65a502459827
    Author: Cole Lyman <[email protected]>
    Date:   Tue Oct 10 15:29:08 2023 -0600

        changed tuple to list for matplotlib change (#31) (#346)

        Co-authored-by: mbowcut2 <[email protected]>

    commit cd3c332135fe4db0f9218e3d87263d5c65838ed9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:54:46 2023 -0600

        rename script to camel case

    commit 7c719d65fb36ac7654db9040f226564ea28fcab9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:53:44 2023 -0600

        Add new script for counting high quality bases

    commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 14 15:15:30 2023 -0600

        Prime editing alignment params (#336)

        Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty.

        CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences.

        The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs.

    commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917
    Author: Cole Lyman <[email protected]>
    Date:   Thu Sep 7 16:43:30 2023 -0600

        Fix samtools piping (#325)

        * Remove samtools pipe stderr to stdout

        Sometimes some of the libraries that samtools depends on don't have the correct
        version information, and as such samtools will report this to stderr when run.
        Because we pipe the output of samtools, we expect it to be valid SAM format, but
        when these library version messages are reported, it breaks CRISPRessoWGS.

        * Remove extra spacing at end of lines and add missing comma in WGS

        * Log stderr from samtools in CRISPRessoWGS

    commit 8feff4101f27406d9d88ace97d31a518276bff3f
    Author: Cole Lyman <[email protected]>
    Date:   Fri Sep 1 09:43:56 2023 -0600

        Replace link to CRISPResso schematic with raw URL in README (#329)

        * Replace link to CRISPResso schematic with raw URL

        * Add new lines to the beginning of unordered lists

    commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:52:12 2023 -0600

        Try to unbreak CircleCI

    commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:27 2023 -0600

        Center command line text messages

    commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:07 2023 -0600

        Fix bug in prime-editing scaffold-incorporation plotting

        If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read.

    commit 2b36a1a5c35e8a93516ce8baf464595615e0f402
    Author: Kendell Clement <[email protected]>
    Date:   Wed Aug 9 15:29:48 2023 -0600

        CRISPRessoPooled --compile_postrun_references bug fixes

    commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db
    Author: Kendell Clement <[email protected]>
    Date:   Tue Aug 8 23:30:15 2023 -0600

        Fix missing ' in Pooled --demultiplex_only_at_amplicons

    commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664
    Author: Cole Lyman <[email protected]>
    Date:   Mon Jul 24 10:47:46 2023 -0600

        Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316)

        * Make sorting stable

        * Including c files

        * Sort by #Reads instead of %Reads to avoid floating point errors

        ---------

        Co-authored-by: Samuel Nichols <[email protected]>

    commit de05533b3511a84f3b6b14fc2ef64db041613261
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jul 6 13:54:45 2023 -0600

        Fix multiprocessing lambda pickling (#311)

        * Fix running plots in parallel

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        * Fix multiprocessing lambda pickling (#20)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Further fixes to pickling multiprocessing error (#21)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Use Counter instead of defaultdict in CRISPRessoCORE

        * Update process_futures to dict in Batch and Aggregate

    commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jul 3 17:12:09 2023 -0600

        Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append

    commit 7285da0e987b77b72c8885bb35940e0f50c146bd
    Author: Kendell Clement <[email protected]>
    Date:   Fri Jun 23 16:50:33 2023 -0600

        Fix print bug for invalid fastq

    commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c
    Author: kclem <[email protected]>
    Date:   Wed Jun 21 16:03:48 2023 -0600

        Slugify before creating filename - replaces invalid characters in batch names with _

    commit f97e29c67de4c80b8d6b9cf334f363be4b514ade
    Author: Cole Lyman <[email protected]>
    Date:   Wed Jun 21 14:43:43 2023 -0600

        Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307)

        * Add verbosity argument to CRISPRessoAggregate (#18)

        * Allow for amplicon and guide seqs to be some variant of NA in batch (#19)

        This was discovered when attempting to infer amplicon sequences in batch mode on
        the web interface, NAs were supplied for the amplicon sequences to the sub
        CRISPResso commands.

    commit 32e1e9797da5c3033cdc588e92f06b8813961953
    Author: Mark Clement <[email protected]>
    Date:   Wed Jun 21 14:01:00 2023 -0600

        Allow for interrogation of overlapping sgRNA sites

    commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 12 12:16:47 2023 -0600

        Check input fastq file format

        Asserts input format of fastq files - including if gzipped files are missing the gz suffix.

    commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:41:55 2023 -0600

        Fix CRISPRessoArgParser

    commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:29:31 2023 -0600

        Cosmetic updates for command-line use

        - version bump to 2.2.13
        - If no args are provided, the command line version will print out an abbreviated help message
        - parameters can be excluded from CRISPRessoArgParser

    commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:47 2023 -0600

        Fix multiprocessing error, don't start pool when only using single thread (#302)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        * Only start process pools when using multiple processes

        This is mainly to solve the issue when running on AWS Lambda, but this should
        improve single core performance overall.

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 92a705c939b370373a70cf6ae9f1616de33288b9
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:06 2023 -0600

        Update `base_editor` parameters in README and add Plot Harness (#301)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 7d46c4490235df45c5546b1b470e4e6a99727031
    Author: Cole Lyman <[email protected]>
    Date:   Wed May 10 15:41:33 2023 -0600

        Clarify CRISPRessoWGS intended use (#303)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add sample plotting jupyter notebook

        * Add clarifying info to CRISPRessoWGS description

        Clarify WGS usage

    commit 833a701787bb47674b3e921c38cac6189c775cf7
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 17:02:46 2023 -0400

        Remove debug print statements

    commit 712eb2a11825e8d36f2870deb12b35486bd633fb
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 16:40:07 2023 -0400

        Allow dashes in filenames resolve #73

    commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:41:58 2023 -0400

        Raise exceptions from within futures in plot_pool

    commit 7e807a60de2a9d18bccd034b87106ceaf7153338
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:38:56 2023 -0400

        Fix future pandas indexing warning

        Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead"

    commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9
    Author: Cole Lyman <[email protected]>
    Date:   Thu Apr 20 13:59:27 2023 -0600

        Remove debug print statements fixes #295 (#297)

        The format string option used here is only available in Python version >=3.8.

    commit 478c06f784603e96d20f96e91993fdcc4ac35c8a
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 13 12:09:26 2023 -0400

        Update plotCustomAllelePlot.py script for #292 (#293)

        Update type of 'max_rows' param to int
        Fix location of 'args' in crispresso2_info object

    commit bcdae39e05d530f4a4e78738c3b30f7664981919
    Author: Kendell Clement <[email protected]>
    Date:   Mon Mar 27 13:18:34 2023 -0400

        Update pooled parameter format

    commit 546446e36e7e68b527767d6c31ec341a49df2059
    Author: Kendell Clement <[email protected]>
    Date:   Tue Feb 14 16:26:23 2023 -0500

        Fix running plots in parallel (#286)

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        Co-authored-by: Cole Lyman <[email protected]>

    commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1
    Author: kclem <[email protected]>
    Date:   Fri Feb 10 15:45:15 2023 -0500

        Fix #283 to avoid filename collisions

        Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29

    commit e577318006cd17b2725bd028e5e56634c6eb829a
    Author: kclem <[email protected]>
    Date:   Mon Feb 6 16:37:25 2023 -0500

        Case-insensitive headers accepted in CRISPRessoPooled

    commit d34927620a4a6126a9988b3041e76f60728abbfe
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:48:33 2023 -0500

        Fix print statement in CORE

    commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:22:51 2023 -0500

        Version bump to 2.2.12

    commit 1d4679c72d0c8b4154317c9aff5179217198e2d7
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:01:31 2023 -0500

        Status Updates + Pooled Mixed Mode Update (#279)

        * Implement logging handler to overwrite the latest log status to file

        * Add StatusHandler to CRISPRessoCORE log

        This will take the latest log output and write it to a file (`status.txt`), the
        catch being that with each log the file is overwritten so that one can easily
        tell where CRISPResso currently is and what the error is (if any). These changes
        include some slight refactoring in order to accomodate any potential parameter
        exceptions.

        * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn`

        * Add StatusHandler to CRISPRessoPooled and a little refactoring

        * Implement `percent_complete` to the status log

        * Add StatusHandler to CRISPRessoAggregate log

        * Add StatusHandler to CRISPRessoCompare log

        * Add StatusHandler to CRISPRessoPooledWGSCompare log

        * Add StatusHandler to CRISPRessoWGS log

        * Rename `status.txt` to `CRISPResso_status.txt`

        * Modify status log names to match the tool they are generated from

        * Add percent_complete stages to CRISPRessoCORE

        These also include log statements of each plot that is being generated as well
        as fixing some variable name collisions with `ind`.

        * Format the percentage in the log to be 2 decimal places

        * Change all plotting logs from `info` to `debug` and simplify progress

        This refactors how the progress of the plots is calculated, making it much
        simplier. Before this change we would of had to keep track of the number of
        times `percent_complete` was output, but now it simply updates the percent
        complete after each amplicon is finished processing. Hopefully this will make
        things easier to mantain even though it will be a little less "accurate" (not
        sure how accurate the original implementation was...).

        * Implemented shared console log handler across all CRISPResso* calls

        This allows for easy changes to logging formatting, which was inspired by having
        to change the default logging level. The default logging level needs to be set
        at `logging.DEBUG` in order for the debug log statements to not be ignored for
        the running and status logs.

        * Add ability to set the verbosity level to each CRISPResso* tool

        This allows users to set a verbosity level between 1 and 4 using the
        `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the
        level will default to 4, being the most verbose.

        * Implement showing the last seen `percent_compelte` when none is provided

        * Keep track of and log when multiple parallel runs are completed

        These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that
        we can now display when a run is completed. This potentially breaks how
        signals and interupts are handled with multiple runs happening, but this needs
        to be reviewed.

        * Add debug and percentage complete to CRISPRessoBatch

        * Add percent complete to CRISPRessoPooled

        * Add debug and percent_complete message to CRISPRessoAggregate

        * Add `percent_complete` to CRISPRessoCompare

        * Add `percent_complete` to CRISPRessoPooledWGSCompare

        * Add status and `percent_complete` to CRISPRessoMeta

        * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        * Fix bug to flow CRISPRessoPooled options to sub command

        * Make amplicon file args variable name clear

        * Update how parameters are set and retrieved from parameter object

        The refactor in the previous commit changed the type of the arguments to a
        dictionary which doesn't have the parameters as attributes, and this commit
        fixes that error.

        * Add note in output header for change in default CRISPRessoPooled

        In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the
        default when running in mixed-mode. This is to allow for inexact alignments of
        the reads and the amplicons to the genome. For more context, see this issue
        https://github.com/pinellolab/CRISPResso2/issues/276

        * Clarify the verbosity parameter help message

        * Separate out parameters to `normalize_name` in CRISPRessoCORE

        * Separate out parameters to `normalize_name` in CRISPRessoWGS

        * Separate out parameters to `normalize_name` in CRISPRessoPooled

        * Separate out parameters to `normalize_name` in CRISPRessoCompare

        * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name`

        * Refactor `run_crispresso_cmds` to not require a `logger`

        This commit implements the functionality to make the `logger` object optional by
        seeing which module called the `run_crispresso_cmds` function and obtaining the
        correct object from that module name.

        The function also immediately returns when no commands are passed to it.

        * Add amplicon name to plotting debug statements in CRISPRessoCORE

        ---------

        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Samuel Nichols <[email protected]>

    commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jan 26 15:27:27 2023 -0500

        CRISPRessoPooled custom header fix (#278)

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 104866e1080c973bb025d1a5ba59b19dca1658af
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 14:00:26 2023 -0700

        Fix deprecated numpy type names (fixes #269) (#270)

        In the most recent version of numpy (1.24) some of the types have been
        deprecated. This commit fixes these errors.

    commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 06:49:35 2023 -0700

        Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274)

        I have suffered enough trying to debug my installation, so hopefully this helps
        someone else.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b9851e98104602eb78c2b384105267624295e9d3
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 22 13:30:23 2022 -0700

        Fix bug when pooled bam is input (#265)

        This change checks to see if a bam file was input, and if so it doesn't try to
        remove any intermediate files because there aren't any.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b822612642043e75a19042941f69b457ce51f517
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 15:26:45 2022 -0500

        Delete vscode settings

    commit b99aa624dec68ef7d19264340ce0cafa829625f4
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:29:14 2022 -0500

        Clarify input param help for pooled bam

    commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:28:54 2022 -0500

        Fix #235 - Cigar string is * if read unaligned

        Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235.

    commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 8 13:48:17 2022 -0700

        Add deprecation notice (#260)

        * Add FLASh and Trimmomatic deprecation notice to CLI output

        * Add Edilytics email address to CLI output

    commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a
    Author: Kendell Clement <[email protected]>
    Date:   Tue Dec 6 12:16:19 2022 -0500

        Format filterReadsOnSequencePresence script

    commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e
    Author: Kendell Clement <[email protected]>
    Date:   Fri Dec 2 22:12:54 2022 -0500

        Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string

    commit 9ddea40f7f02b546941ddaa4c71fc5283075051a
    Author: kclem <[email protected]>
    Date:   Mon Nov 14 10:33:04 2022 -0500

        Add check for prime editing extension sequence in prime edited sequence

        if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence.

    commit 152f2dd5001da7090641ee8a1326bde9f7e8104e
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:53:41 2022 -0500

        Version bump to 2.2.11a

    commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:47:30 2022 -0500

        Add param to override prime editing sequence checks

        CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.'

    commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 10:06:51 2022 -0500

        Update filterReadsOnSequencePresence.py

    commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb
    Author: Kendell Clement <[email protected]>
    Date:   Mon Nov 7 22:25:16 2022 -0500

        Add script to filter input based on sequence presence

    commit 713e57a19c35180035ca35e11a5820065eda0198
    Author: Kendell Clement <[email protected]>
    Date:   Tue Oct 18 16:02:26 2022 -0400

        Allow spaces in read names for CRISPRessoWGS

    commit 39ce008bdddccdd8229c0ba185dce78bc2f66968
    Author: Cole Lyman <[email protected]>
    Date:   Sat Oct 8 21:09:58 2022 -0600

        Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250)

    commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Oct 8 23:08:47 2022 -0400

        Batch amplicon plots (#251)

        * Error out if HDR amplicon matches existing amplicon

        * Add check for amplicon sequence uniqueness

        * Fix bug with bam_input not having bam_output

        * Test for no returned lines in auto mode, version bump to 2.2.11

        * Fix pandas deprecation of df.append

    commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8
    Author: Kendell Clement <[email protected]>
    Date:   Thu Oct 6 16:32:02 2022 -0400

        Fix CRISPRessoBatch plot pool bug when plots are suppressed

    commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 21 21:04:51 2022 -0600

        Fix batch quilt plot name (#249)

        This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch.

    commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 15 15:49:08 2022 -0400

        Version bump to 2.2.10

    commit c5f79aebfc1ae209f4ee320df250eed89a02787c
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 14 14:24:55 2022 -0600

        Parallel plot refactor (#247)

        * Fix duplicate plotting in CRISPRessoBatch aggregate

        * Refactor mulltiprocessing plots in CRISPRessoBatch

        * Refactor multiprocessing plots in CRISPRessoCORE

        * Refactor multiprocessing plots for CRISPRessoAggregate

    commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 13 14:12:11 2022 -0400

        print files in curr dir if Aggregate can't find files

    commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0
    Author: Kendell Clement <[email protected]>
    Date:   Mon Sep 12 10:32:57 2022 -0400

        Spelling typo

    commit c15f01c75083403f17c58c121b2afe97e9f2a1ec
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 6 17:49:52 2022 -0400

        Add helper function to create alignment scoring matrix

        New scoring matrix can be created using CRISPResso2Align.make_matrix()

    commit c80f82838c5a228b79ad4484092877cfee08e02c
    Author: Cole Lyman <[email protected]>
    Date:   Mon Aug 22 18:28:33 2022 -0600

        Add `zip_output` (#240)

        * Making zip of results

        * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping

        * Adding --zip to compare and pooled/wgs compare

        * Add more formatting changes to CRISPRessoShared

        * Refactoring propagate_crispress_options so only one version exists

        * Zip added to arguments_to_ignore and warning added when changing arguments

        * Restore styling

        * Update README to include --zip

        * Rename --zip to --zip_output

        * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE

        * Bug fix arg to args

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 5de3d7286d8e33c7cf4d3615fce715806e72f511
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:42:34 2022 -0400

        Fix fix to aggregate for CRISPRessoWGS

    commit a2294c266f43b14969a5d6474076f31a77a57173
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:40:50 2022 -0400

        Fix bug in aggregate for WGS

    commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3
    Author: Kendell Clement <[email protected]>
    Date:   Mon Aug 8 21:53:45 2022 -0400

        Update CRISPRessoWGS to allow non-word characters in region names

    commit 040ac0033d6e250f4e3a412101874cf5e914e08a
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 16:04:59 2022 -0400

        Enable processing of cram files by CRISPRessoWGS

        Adds --reference to samtools view when viewing cram files

    commit cf112a0caba8789e28530cc09171285ec6ea9b4c
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 14:55:46 2022 -0400

        Auto amplicon detection for interleaved input

        Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed.

    commit 4ba524dc7b947feca8a0f743837844f9febc2171
    Author: Cole Lyman <[email protected]>
    Date:   Thu Aug 4 11:32:11 2022 -0600

        Potential fix for aggregate plots in Batch mode (#237)

    commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 21 22:45:48 2022 -0400

        Fix pct_vectors in crispresso2_info json object

    commit 65a079d86d6f386793397398f839c46014b54543
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:46:37 2022 -0400

        Fix more readme spelling bugs

    commit e817376ecd54cdea1f29e303ca25b9e7d1d38333
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:42:23 2022 -0400

        Fix bug in readme spelling

    commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 16:10:09 2022 -0400

        Fix loading of crispresso info from WGS and Pooled

    commit b68a43271115251b18e8955e285ccc18f549e8cd
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:11:04 2022 -0400

        Add plotly to dockerfile

    commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:10:00 2022 -0400

        Fix #231 Allow N's in bam output (Try 2)

    commit c460b3e73fd06a230dbac2e37c86b833144ebf94
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:09:10 2022 -0400

        Revert "Fix #231 Allow N's in bam output"

        This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3.

    commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 13:52:37 2022 -0400

        Fix #231 Allow N's in bam output

    commit 0a2419e518dc9b3520058c3927f98b31cd51347e
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:10:01 2022 -0600

        Fix bug when name is provided instead of amplicon_name in pooled input file (#229)

        Also, raise an exception (instead of incorrectly executing) when there are not
        enough matched parameters in the pooled input file.

    commit cb58212379803788c04ca5793baaa760cbbeaa81
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:09:49 2022 -0600

        Fix bug when comparing two samples with the same name. (#228)

    commit e8a796f5f451409cbafed4404dfba4b6b8a124ca
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jun 23 21:30:23 2022 -0400

        Version bump to 2.2.9

    commit 632143ddedea48bab9229baeb4bf3ea4d1f658d6
    Author: Cole Lyman <[email protected]>
    Date:   Mon Jun 20 19:53:14 2022 -0600

        Don't run global frameshift plot when there are no reads (#226)

        When there are no reads (i.e. global_MODIFIED_FRAMESHIFT +
        global_MODIFIED_NON_FRAMESHIFT + global_NON_MODIFIED_NON_FRAMESHIFT == 0) there
        was a bug when trying to compute the pie chart, because all of the values in the
        pie chart are 0. This fix, will make sure that there is at least one read in
        order for the plot to bee constructed properly.

    commit 4bb06218e835d2624d53fd401542caef6f8a3a55
    Author: kclem <[email protected]>
    Date:   Fri Jun 3 16:57:02 2022 -0400

        Improvements for guide inference in 'auto' mode

        In 'auto' mode, a putative guide sequence is selected at the site of maximal editing.  If the site of maximal editing happens near the end of the guide (e.g. base 0) many things will break (e.g. quantification windows, etc). This update excludes bases from being used to find the guide using the --exclude_bp_from_left and --exclude_bp_from_right parameters. At default, these parameters are 15bp, so the first and last 15bp would not be selected for the site of maximal editing and thus be the site of a guide sequence. In addition, the site of maximal editing must have 3x the magnitude over the background.

    commit 9d64de187835b2553ad2b4374d32edab27f83645
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jun 2 20:22:25 2022 -0400

        Update README.md

    commit 6aafc5387986f5089ba55b68d128343d68052792
    Author: Simon P Shen <[email protected]>
    Date:   Tue May 31 17:42:53 2022 -0400

        directory in quotes in batch cmd (#222)

        Add quotes around output folder for folders that have spaces.

    commit 432f163ac68b9a650d1fd326171aadc505ee87f4
    Author: Kendell Clement <[email protected]>
    Date:   Tue May 24 23:38:36 2022 -0400

        CRISPRessoBatch fills NA values in batch settings

 …
mbowcut2 added a commit to edilytics/CRISPResso2 that referenced this pull request Nov 8, 2024
* Change CRISPResso_status.txt format to JSON (#46)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* add json read for status file

* changed Formatter to json format

* fixed json access variable name: message

* changed  perentage_complete to numeric

* changed status file to .json

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* New makefile commands

* changed file to .json

* changed status to json file

* Make JSON human readable by adding new lines

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* point to test branch

* pointed CI config to testing branch

* Update integration_tests.yml

point to master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: Samuel Nichols <[email protected]>

* Trevor/fastp integration (#50)

* Update check_program to check versions and create check_fastq function

* Update fastq arg, implement fastp in get_most_frequent_reads

* Bump version to 2.3.0

* Deprecate Flash and Trimmomatic parameters, and update fastp params

* Update guess_amplicons and guess_guides to remove max_paired_end_reads_overlap

* Implement trimming of single end reads

* Merge (and trim) reads in CRISPRessoCORE with fastp

* Modify error handling to account for fastp errors

* Replace flash and trimmomatic with fastp in Docker dependencies

* Update LICENSE.txt with fastp info

* Remove min and max amplicon length (no longer needed)

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Implement trimming with fastp in CRISPRessoPooled

* Implemend merging (and trimming) with fastp in CRISPRessoPooled

* Fixed minor fastp errors

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* Update where the test point to

* Fix 'Prime-edited' key not found (#32)

* Move 'Prime-edited' amplicon name check

By moving this, it will check if there is an amplicon named
'Prime-edited' (which is a reserved name) even if the
`prime_editing_pegRNA_extension_seq` parameter is empty.

* Only search for scaffold integration when pegRNA extension seq is provided

* Remove spaces at the end of lines

* Docker size (#49)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* 3.4->2.08

* Put ttf-mscorefonts-installer back above apt-get clean

* restore slash, replace fastp with trimmomatic and flash, add autoremove step

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* initial readme modifications

* Updated readme to remove deprecated commands, updated help text to reflect new version and fastp

* Pointing test branch back at master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Samuel Nichols <[email protected]>

* Guardrails clean history (#34)

* Include guardrail functions

* Add CRISPRessoReports subtree

* Refactor to use CRISPRessoReports module

* Include guardrail functions

* Functional guardrails, needs reports update

* Add guardrail partial

* fix guardrials partial

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Update C cythonized files

* Add exact numbers to guardrails printouts

* Remove extraneous whitespace from CRISPRessoCOREResources.pyx

* Fix calculation of `total_mods` from being negative

The issue was that `all_deletion_coordinates` just tells you how many deletions
were present, but not how long the deletion is.

* Changes to message

* Remove old tag

* Point tests at guardrails

* Restore C2 pro check

* Save message with guardrail name

* Point tests repo at master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>

* Fix case sensitivity in Prime Editing mode (#54)

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* Make all amplicons in amplicon_seq_arr uppercase

This fixes https://github.com/pinellolab/CRISPResso2/issues/396

* Allow RNA values to be provided for prime_editing_pegRNA_scaffold_seq

* Fix 'Prime-edited' key not found (#32)

* Move 'Prime-edited' amplicon name check

By moving this, it will check if there is an amplicon named
'Prime-edited' (which is a reserved name) even if the
`prime_editing_pegRNA_extension_seq` parameter is empty.

* Only search for scaffold integration when pegRNA extension seq is provided

* Remove spaces at the end of lines

* Docker size (#49)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* 3.4->2.08

* Put ttf-mscorefonts-installer back above apt-get clean

* restore slash, replace fastp with trimmomatic and flash, add autoremove step

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Guardrails clean history (#34)

* Include guardrail functions

* Add CRISPRessoReports subtree

* Refactor to use CRISPRessoReports module

* Include guardrail functions

* Functional guardrails, needs reports update

* Add guardrail partial

* fix guardrials partial

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Update C cythonized files

* Add exact numbers to guardrails printouts

* Remove extraneous whitespace from CRISPRessoCOREResources.pyx

* Fix calculation of `total_mods` from being negative

The issue was that `all_deletion_coordinates` just tells you how many deletions
were present, but not how long the deletion is.

* Changes to message

* Remove old tag

* Point tests at guardrails

* Restore C2 pro check

* Save message with guardrail name

* Point tests repo at master

---------

Co-authored-by: Cole Lyman <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>

---------

Co-authored-by: Samuel Nichols <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: trevormartinj7 <[email protected]>

* Batch d3 clean (#55)

* imports C2Pro plots if available

* added --use_matplotlib flag

* added C2Pro
matched api funciton signatures

* added api args for plotly

* added **kwargs

* renamed config to custom_config, more specificity

* added backend flag for plotly kaleido

* added pro_installed boolean for templates, added plotly dependency to report templates

* Squashed commit of the following:

commit c909ea3b34e87ce637e00dac075d2bb2f8bfb954
Author: McKay <[email protected]>
Date:   Thu Feb 15 15:55:23 2024 -0700

    added plotly dependency for pro

commit 76b3601f6a0144f100266153f1c999e0c5de65de
Author: Samuel Nichols <[email protected]>
Date:   Fri Jan 12 09:56:19 2024 -0700

    Squashed commit of the following:

    commit 603f2eff9d1aa21ae95f3e134da303b8018d3a33
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 09:48:20 2024 -0700

        fix guardrials partial

    commit 22fc03183a8070c30dfb74d5c23575ac19019855
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Jan 12 08:54:01 2024 -0700

        Add guardrail partial

    commit e55f6b21972b578261bc5a864ce1d653d98f9e34
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Jan 8 07:50:59 2024 -0700

        Functional guardrails, needs reports update

    commit 6e968e9699ed59a47d88191d03768e042d8b60a4
    Merge: 32b49685 e948ce10
    Author: Samuel Nichols <[email protected]>
    Date:   Mon Dec 18 13:34:36 2023 -0700

        Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history

    commit 32b49685da320501dad2b0ebbb57887b66220ba8
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit 4e309cf6f732565d635de3d4c5d074ada3027e2d
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:55 2023 -0700

        Refactor to use CRISPRessoReports module

    commit e648dc087c0055bc5d2fca13c64071a371dea941
    Author: Cole Lyman <[email protected]>
    Date:   Mon Dec 18 10:51:11 2023 -0700

        Add CRISPRessoReports subtree

    commit e948ce107ebb0d1d99010ed12e937f34b5e607d4
    Author: Samuel Nichols <[email protected]>
    Date:   Fri Dec 15 15:27:04 2023 -0700

        Include guardrail functions

    commit d33c748871a625facfe8d792e29c77ab9779138f
    Author: Kendell Clement <[email protected]>
    Date:   Tue Nov 7 16:31:06 2023 -0700

        Include parameter --assign_ambiguous_alignments_to_first_reference in readme

    commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc
    Author: Kendell Clement <[email protected]>
    Date:   Wed Oct 11 17:17:30 2023 -0600

        Enable quantification by sgRNA (#348)

        This PR includes:
        - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs']
        - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately.

        I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command:

        ```

        CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table  -n hdr3 -p max -gn guide1,guide2
        ```

        ```
        python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3
        ```

        This produces:
        ```
        Processed 25000 alleles
        Reference: Reference (2391/23415 modified reads)
                UNMODIFIED: 21024
                MODIFIED guide1: 2359
                MODIFIED guide2: 32
        Reference: HDR (856/1577 modified reads)
                UNMODIFIED: 721
                MODIFIED guide1: 854
                MODIFIED guide1 + guide2: 1
                MODIFIED guide2: 1
         ```

    commit 2e3da02fdbed2fa8ae02a277763d65a502459827
    Author: Cole Lyman <[email protected]>
    Date:   Tue Oct 10 15:29:08 2023 -0600

        changed tuple to list for matplotlib change (#31) (#346)

        Co-authored-by: mbowcut2 <[email protected]>

    commit cd3c332135fe4db0f9218e3d87263d5c65838ed9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:54:46 2023 -0600

        rename script to camel case

    commit 7c719d65fb36ac7654db9040f226564ea28fcab9
    Author: Kendell Clement <[email protected]>
    Date:   Sun Oct 1 01:53:44 2023 -0600

        Add new script for counting high quality bases

    commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 14 15:15:30 2023 -0600

        Prime editing alignment params (#336)

        Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty.

        CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences.

        The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs.

    commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917
    Author: Cole Lyman <[email protected]>
    Date:   Thu Sep 7 16:43:30 2023 -0600

        Fix samtools piping (#325)

        * Remove samtools pipe stderr to stdout

        Sometimes some of the libraries that samtools depends on don't have the correct
        version information, and as such samtools will report this to stderr when run.
        Because we pipe the output of samtools, we expect it to be valid SAM format, but
        when these library version messages are reported, it breaks CRISPRessoWGS.

        * Remove extra spacing at end of lines and add missing comma in WGS

        * Log stderr from samtools in CRISPRessoWGS

    commit 8feff4101f27406d9d88ace97d31a518276bff3f
    Author: Cole Lyman <[email protected]>
    Date:   Fri Sep 1 09:43:56 2023 -0600

        Replace link to CRISPResso schematic with raw URL in README (#329)

        * Replace link to CRISPResso schematic with raw URL

        * Add new lines to the beginning of unordered lists

    commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:52:12 2023 -0600

        Try to unbreak CircleCI

    commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:27 2023 -0600

        Center command line text messages

    commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 10 00:17:07 2023 -0600

        Fix bug in prime-editing scaffold-incorporation plotting

        If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read.

    commit 2b36a1a5c35e8a93516ce8baf464595615e0f402
    Author: Kendell Clement <[email protected]>
    Date:   Wed Aug 9 15:29:48 2023 -0600

        CRISPRessoPooled --compile_postrun_references bug fixes

    commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db
    Author: Kendell Clement <[email protected]>
    Date:   Tue Aug 8 23:30:15 2023 -0600

        Fix missing ' in Pooled --demultiplex_only_at_amplicons

    commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664
    Author: Cole Lyman <[email protected]>
    Date:   Mon Jul 24 10:47:46 2023 -0600

        Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316)

        * Make sorting stable

        * Including c files

        * Sort by #Reads instead of %Reads to avoid floating point errors

        ---------

        Co-authored-by: Samuel Nichols <[email protected]>

    commit de05533b3511a84f3b6b14fc2ef64db041613261
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jul 6 13:54:45 2023 -0600

        Fix multiprocessing lambda pickling (#311)

        * Fix running plots in parallel

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        * Fix multiprocessing lambda pickling (#20)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Further fixes to pickling multiprocessing error (#21)

        * Refactor process_futures to be a dict

        This makes debugging much easier because you can associate the arguments to the
        future with the results.

        * Fix the pickling error when running in multiprocessing

        Only top-level functions (not lambdas) can be pickled to use in multiprocessing
        pools, thus the lambdas are converted to a regular function.

        * Use Counter instead of defaultdict in CRISPRessoCORE

        * Update process_futures to dict in Batch and Aggregate

    commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jul 3 17:12:09 2023 -0600

        Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append

    commit 7285da0e987b77b72c8885bb35940e0f50c146bd
    Author: Kendell Clement <[email protected]>
    Date:   Fri Jun 23 16:50:33 2023 -0600

        Fix print bug for invalid fastq

    commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c
    Author: kclem <[email protected]>
    Date:   Wed Jun 21 16:03:48 2023 -0600

        Slugify before creating filename - replaces invalid characters in batch names with _

    commit f97e29c67de4c80b8d6b9cf334f363be4b514ade
    Author: Cole Lyman <[email protected]>
    Date:   Wed Jun 21 14:43:43 2023 -0600

        Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307)

        * Add verbosity argument to CRISPRessoAggregate (#18)

        * Allow for amplicon and guide seqs to be some variant of NA in batch (#19)

        This was discovered when attempting to infer amplicon sequences in batch mode on
        the web interface, NAs were supplied for the amplicon sequences to the sub
        CRISPResso commands.

    commit 32e1e9797da5c3033cdc588e92f06b8813961953
    Author: Mark Clement <[email protected]>
    Date:   Wed Jun 21 14:01:00 2023 -0600

        Allow for interrogation of overlapping sgRNA sites

    commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 12 12:16:47 2023 -0600

        Check input fastq file format

        Asserts input format of fastq files - including if gzipped files are missing the gz suffix.

    commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:41:55 2023 -0600

        Fix CRISPRessoArgParser

    commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be
    Author: Kendell Clement <[email protected]>
    Date:   Mon Jun 5 13:29:31 2023 -0600

        Cosmetic updates for command-line use

        - version bump to 2.2.13
        - If no args are provided, the command line version will print out an abbreviated help message
        - parameters can be excluded from CRISPRessoArgParser

    commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:47 2023 -0600

        Fix multiprocessing error, don't start pool when only using single thread (#302)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        * Only start process pools when using multiple processes

        This is mainly to solve the issue when running on AWS Lambda, but this should
        improve single core performance overall.

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 92a705c939b370373a70cf6ae9f1616de33288b9
    Author: Cole Lyman <[email protected]>
    Date:   Thu May 11 14:31:06 2023 -0600

        Update `base_editor` parameters in README and add Plot Harness (#301)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add files via upload

        ---------

        Co-authored-by: Kendell Clement <[email protected]>

    commit 7d46c4490235df45c5546b1b470e4e6a99727031
    Author: Cole Lyman <[email protected]>
    Date:   Wed May 10 15:41:33 2023 -0600

        Clarify CRISPRessoWGS intended use (#303)

        * Update README to have consistent use of `--base_editor_output` (#16)

        * Add sample plotting jupyter notebook

        * Add clarifying info to CRISPRessoWGS description

        Clarify WGS usage

    commit 833a701787bb47674b3e921c38cac6189c775cf7
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 17:02:46 2023 -0400

        Remove debug print statements

    commit 712eb2a11825e8d36f2870deb12b35486bd633fb
    Author: Kendell Clement <[email protected]>
    Date:   Thu May 4 16:40:07 2023 -0400

        Allow dashes in filenames resolve #73

    commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:41:58 2023 -0400

        Raise exceptions from within futures in plot_pool

    commit 7e807a60de2a9d18bccd034b87106ceaf7153338
    Author: Kendell Clement <[email protected]>
    Date:   Sat Apr 22 23:38:56 2023 -0400

        Fix future pandas indexing warning

        Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead"

    commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9
    Author: Cole Lyman <[email protected]>
    Date:   Thu Apr 20 13:59:27 2023 -0600

        Remove debug print statements fixes #295 (#297)

        The format string option used here is only available in Python version >=3.8.

    commit 478c06f784603e96d20f96e91993fdcc4ac35c8a
    Author: Kendell Clement <[email protected]>
    Date:   Thu Apr 13 12:09:26 2023 -0400

        Update plotCustomAllelePlot.py script for #292 (#293)

        Update type of 'max_rows' param to int
        Fix location of 'args' in crispresso2_info object

    commit bcdae39e05d530f4a4e78738c3b30f7664981919
    Author: Kendell Clement <[email protected]>
    Date:   Mon Mar 27 13:18:34 2023 -0400

        Update pooled parameter format

    commit 546446e36e7e68b527767d6c31ec341a49df2059
    Author: Kendell Clement <[email protected]>
    Date:   Tue Feb 14 16:26:23 2023 -0500

        Fix running plots in parallel (#286)

        The reason the plots were running slower before this change is because I was
        calling the plot function, not passing it to `submit`. So it was essentially
        running in serial, but worse because it was still spinning up/down the
        processes.

        Co-authored-by: Cole Lyman <[email protected]>

    commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1
    Author: kclem <[email protected]>
    Date:   Fri Feb 10 15:45:15 2023 -0500

        Fix #283 to avoid filename collisions

        Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29

    commit e577318006cd17b2725bd028e5e56634c6eb829a
    Author: kclem <[email protected]>
    Date:   Mon Feb 6 16:37:25 2023 -0500

        Case-insensitive headers accepted in CRISPRessoPooled

    commit d34927620a4a6126a9988b3041e76f60728abbfe
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:48:33 2023 -0500

        Fix print statement in CORE

    commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:22:51 2023 -0500

        Version bump to 2.2.12

    commit 1d4679c72d0c8b4154317c9aff5179217198e2d7
    Author: Kendell Clement <[email protected]>
    Date:   Tue Jan 31 13:01:31 2023 -0500

        Status Updates + Pooled Mixed Mode Update (#279)

        * Implement logging handler to overwrite the latest log status to file

        * Add StatusHandler to CRISPRessoCORE log

        This will take the latest log output and write it to a file (`status.txt`), the
        catch being that with each log the file is overwritten so that one can easily
        tell where CRISPResso currently is and what the error is (if any). These changes
        include some slight refactoring in order to accomodate any potential parameter
        exceptions.

        * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn`

        * Add StatusHandler to CRISPRessoPooled and a little refactoring

        * Implement `percent_complete` to the status log

        * Add StatusHandler to CRISPRessoAggregate log

        * Add StatusHandler to CRISPRessoCompare log

        * Add StatusHandler to CRISPRessoPooledWGSCompare log

        * Add StatusHandler to CRISPRessoWGS log

        * Rename `status.txt` to `CRISPResso_status.txt`

        * Modify status log names to match the tool they are generated from

        * Add percent_complete stages to CRISPRessoCORE

        These also include log statements of each plot that is being generated as well
        as fixing some variable name collisions with `ind`.

        * Format the percentage in the log to be 2 decimal places

        * Change all plotting logs from `info` to `debug` and simplify progress

        This refactors how the progress of the plots is calculated, making it much
        simplier. Before this change we would of had to keep track of the number of
        times `percent_complete` was output, but now it simply updates the percent
        complete after each amplicon is finished processing. Hopefully this will make
        things easier to mantain even though it will be a little less "accurate" (not
        sure how accurate the original implementation was...).

        * Implemented shared console log handler across all CRISPResso* calls

        This allows for easy changes to logging formatting, which was inspired by having
        to change the default logging level. The default logging level needs to be set
        at `logging.DEBUG` in order for the debug log statements to not be ignored for
        the running and status logs.

        * Add ability to set the verbosity level to each CRISPResso* tool

        This allows users to set a verbosity level between 1 and 4 using the
        `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the
        level will default to 4, being the most verbose.

        * Implement showing the last seen `percent_compelte` when none is provided

        * Keep track of and log when multiple parallel runs are completed

        These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that
        we can now display when a run is completed. This potentially breaks how
        signals and interupts are handled with multiple runs happening, but this needs
        to be reviewed.

        * Add debug and percentage complete to CRISPRessoBatch

        * Add percent complete to CRISPRessoPooled

        * Add debug and percent_complete message to CRISPRessoAggregate

        * Add `percent_complete` to CRISPRessoCompare

        * Add `percent_complete` to CRISPRessoPooledWGSCompare

        * Add status and `percent_complete` to CRISPRessoMeta

        * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        * Fix bug to flow CRISPRessoPooled options to sub command

        * Make amplicon file args variable name clear

        * Update how parameters are set and retrieved from parameter object

        The refactor in the previous commit changed the type of the arguments to a
        dictionary which doesn't have the parameters as attributes, and this commit
        fixes that error.

        * Add note in output header for change in default CRISPRessoPooled

        In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the
        default when running in mixed-mode. This is to allow for inexact alignments of
        the reads and the amplicons to the genome. For more context, see this issue
        https://github.com/pinellolab/CRISPResso2/issues/276

        * Clarify the verbosity parameter help message

        * Separate out parameters to `normalize_name` in CRISPRessoCORE

        * Separate out parameters to `normalize_name` in CRISPRessoWGS

        * Separate out parameters to `normalize_name` in CRISPRessoPooled

        * Separate out parameters to `normalize_name` in CRISPRessoCompare

        * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name`

        * Refactor `run_crispresso_cmds` to not require a `logger`

        This commit implements the functionality to make the `logger` object optional by
        seeing which module called the `run_crispresso_cmds` function and obtaining the
        correct object from that module name.

        The function also immediately returns when no commands are passed to it.

        * Add amplicon name to plotting debug statements in CRISPRessoCORE

        ---------

        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Cole Lyman <[email protected]>
        Co-authored-by: Samuel Nichols <[email protected]>

    commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jan 26 15:27:27 2023 -0500

        CRISPRessoPooled custom header fix (#278)

        * Fixing documentation to match pooled headers

        * Header removal bug fix change documentation to guide_seq

        * Update documentation and help feature for CRISPRessoPooled

        * Remove extra newlines from CRISPRessoPooled -h

        * Make variable names as clear as my firstborn child's name

        * Update one more variable name

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 104866e1080c973bb025d1a5ba59b19dca1658af
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 14:00:26 2023 -0700

        Fix deprecated numpy type names (fixes #269) (#270)

        In the most recent version of numpy (1.24) some of the types have been
        deprecated. This commit fixes these errors.

    commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4
    Author: Cole Lyman <[email protected]>
    Date:   Thu Jan 5 06:49:35 2023 -0700

        Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274)

        I have suffered enough trying to debug my installation, so hopefully this helps
        someone else.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b9851e98104602eb78c2b384105267624295e9d3
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 22 13:30:23 2022 -0700

        Fix bug when pooled bam is input (#265)

        This change checks to see if a bam file was input, and if so it doesn't try to
        remove any intermediate files because there aren't any.

        Co-authored-by: Cole Lyman <[email protected]>

    commit b822612642043e75a19042941f69b457ce51f517
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 15:26:45 2022 -0500

        Delete vscode settings

    commit b99aa624dec68ef7d19264340ce0cafa829625f4
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:29:14 2022 -0500

        Clarify input param help for pooled bam

    commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04
    Author: Kendell Clement <[email protected]>
    Date:   Mon Dec 19 13:28:54 2022 -0500

        Fix #235 - Cigar string is * if read unaligned

        Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235.

    commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b
    Author: Cole Lyman <[email protected]>
    Date:   Thu Dec 8 13:48:17 2022 -0700

        Add deprecation notice (#260)

        * Add FLASh and Trimmomatic deprecation notice to CLI output

        * Add Edilytics email address to CLI output

    commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a
    Author: Kendell Clement <[email protected]>
    Date:   Tue Dec 6 12:16:19 2022 -0500

        Format filterReadsOnSequencePresence script

    commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e
    Author: Kendell Clement <[email protected]>
    Date:   Fri Dec 2 22:12:54 2022 -0500

        Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string

    commit 9ddea40f7f02b546941ddaa4c71fc5283075051a
    Author: kclem <[email protected]>
    Date:   Mon Nov 14 10:33:04 2022 -0500

        Add check for prime editing extension sequence in prime edited sequence

        if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence.

    commit 152f2dd5001da7090641ee8a1326bde9f7e8104e
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:53:41 2022 -0500

        Version bump to 2.2.11a

    commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 11:47:30 2022 -0500

        Add param to override prime editing sequence checks

        CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.'

    commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b
    Author: kclem <[email protected]>
    Date:   Wed Nov 9 10:06:51 2022 -0500

        Update filterReadsOnSequencePresence.py

    commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb
    Author: Kendell Clement <[email protected]>
    Date:   Mon Nov 7 22:25:16 2022 -0500

        Add script to filter input based on sequence presence

    commit 713e57a19c35180035ca35e11a5820065eda0198
    Author: Kendell Clement <[email protected]>
    Date:   Tue Oct 18 16:02:26 2022 -0400

        Allow spaces in read names for CRISPRessoWGS

    commit 39ce008bdddccdd8229c0ba185dce78bc2f66968
    Author: Cole Lyman <[email protected]>
    Date:   Sat Oct 8 21:09:58 2022 -0600

        Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250)

    commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5
    Author: Kendell Clement <[email protected]>
    Date:   Sat Oct 8 23:08:47 2022 -0400

        Batch amplicon plots (#251)

        * Error out if HDR amplicon matches existing amplicon

        * Add check for amplicon sequence uniqueness

        * Fix bug with bam_input not having bam_output

        * Test for no returned lines in auto mode, version bump to 2.2.11

        * Fix pandas deprecation of df.append

    commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8
    Author: Kendell Clement <[email protected]>
    Date:   Thu Oct 6 16:32:02 2022 -0400

        Fix CRISPRessoBatch plot pool bug when plots are suppressed

    commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 21 21:04:51 2022 -0600

        Fix batch quilt plot name (#249)

        This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch.

    commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e
    Author: Kendell Clement <[email protected]>
    Date:   Thu Sep 15 15:49:08 2022 -0400

        Version bump to 2.2.10

    commit c5f79aebfc1ae209f4ee320df250eed89a02787c
    Author: Cole Lyman <[email protected]>
    Date:   Wed Sep 14 14:24:55 2022 -0600

        Parallel plot refactor (#247)

        * Fix duplicate plotting in CRISPRessoBatch aggregate

        * Refactor mulltiprocessing plots in CRISPRessoBatch

        * Refactor multiprocessing plots in CRISPRessoCORE

        * Refactor multiprocessing plots for CRISPRessoAggregate

    commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 13 14:12:11 2022 -0400

        print files in curr dir if Aggregate can't find files

    commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0
    Author: Kendell Clement <[email protected]>
    Date:   Mon Sep 12 10:32:57 2022 -0400

        Spelling typo

    commit c15f01c75083403f17c58c121b2afe97e9f2a1ec
    Author: Kendell Clement <[email protected]>
    Date:   Tue Sep 6 17:49:52 2022 -0400

        Add helper function to create alignment scoring matrix

        New scoring matrix can be created using CRISPResso2Align.make_matrix()

    commit c80f82838c5a228b79ad4484092877cfee08e02c
    Author: Cole Lyman <[email protected]>
    Date:   Mon Aug 22 18:28:33 2022 -0600

        Add `zip_output` (#240)

        * Making zip of results

        * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping

        * Adding --zip to compare and pooled/wgs compare

        * Add more formatting changes to CRISPRessoShared

        * Refactoring propagate_crispress_options so only one version exists

        * Zip added to arguments_to_ignore and warning added when changing arguments

        * Restore styling

        * Update README to include --zip

        * Rename --zip to --zip_output

        * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE

        * Bug fix arg to args

        Co-authored-by: Samuel Nichols <[email protected]>

    commit 5de3d7286d8e33c7cf4d3615fce715806e72f511
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:42:34 2022 -0400

        Fix fix to aggregate for CRISPRessoWGS

    commit a2294c266f43b14969a5d6474076f31a77a57173
    Author: Kendell Clement <[email protected]>
    Date:   Thu Aug 11 21:40:50 2022 -0400

        Fix bug in aggregate for WGS

    commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3
    Author: Kendell Clement <[email protected]>
    Date:   Mon Aug 8 21:53:45 2022 -0400

        Update CRISPRessoWGS to allow non-word characters in region names

    commit 040ac0033d6e250f4e3a412101874cf5e914e08a
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 16:04:59 2022 -0400

        Enable processing of cram files by CRISPRessoWGS

        Adds --reference to samtools view when viewing cram files

    commit cf112a0caba8789e28530cc09171285ec6ea9b4c
    Author: kclem <[email protected]>
    Date:   Mon Aug 8 14:55:46 2022 -0400

        Auto amplicon detection for interleaved input

        Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed.

    commit 4ba524dc7b947feca8a0f743837844f9febc2171
    Author: Cole Lyman <[email protected]>
    Date:   Thu Aug 4 11:32:11 2022 -0600

        Potential fix for aggregate plots in Batch mode (#237)

    commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 21 22:45:48 2022 -0400

        Fix pct_vectors in crispresso2_info json object

    commit 65a079d86d6f386793397398f839c46014b54543
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:46:37 2022 -0400

        Fix more readme spelling bugs

    commit e817376ecd54cdea1f29e303ca25b9e7d1d38333
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 23:42:23 2022 -0400

        Fix bug in readme spelling

    commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d
    Author: Kendell Clement <[email protected]>
    Date:   Wed Jul 20 16:10:09 2022 -0400

        Fix loading of crispresso info from WGS and Pooled

    commit b68a43271115251b18e8955e285ccc18f549e8cd
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:11:04 2022 -0400

        Add plotly to dockerfile

    commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:10:00 2022 -0400

        Fix #231 Allow N's in bam output (Try 2)

    commit c460b3e73fd06a230dbac2e37c86b833144ebf94
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 14:09:10 2022 -0400

        Revert "Fix #231 Allow N's in bam output"

        This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3.

    commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jul 14 13:52:37 2022 -0400

        Fix #231 Allow N's in bam output

    commit 0a2419e518dc9b3520058c3927f98b31cd51347e
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:10:01 2022 -0600

        Fix bug when name is provided instead of amplicon_name in pooled input file (#229)

        Also, raise an exception (instead of incorrectly executing) when there are not
        enough matched parameters in the pooled input file.

    commit cb58212379803788c04ca5793baaa760cbbeaa81
    Author: Cole Lyman <[email protected]>
    Date:   Fri Jul 8 21:09:49 2022 -0600

        Fix bug when comparing two samples with the same name. (#228)

    commit e8a796f5f451409cbafed4404dfba4b6b8a124ca
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jun 23 21:30:23 2022 -0400

        Version bump to 2.2.9

    commit 632143ddedea48bab9229baeb4bf3ea4d1f658d6
    Author: Cole Lyman <[email protected]>
    Date:   Mon Jun 20 19:53:14 2022 -0600

        Don't run global frameshift plot when there are no reads (#226)

        When there are no reads (i.e. global_MODIFIED_FRAMESHIFT +
        global_MODIFIED_NON_FRAMESHIFT + global_NON_MODIFIED_NON_FRAMESHIFT == 0) there
        was a bug when trying to compute the pie chart, because all of the values in the
        pie chart are 0. This fix, will make sure that there is at least one read in
        order for the plot to bee constructed properly.

    commit 4bb06218e835d2624d53fd401542caef6f8a3a55
    Author: kclem <[email protected]>
    Date:   Fri Jun 3 16:57:02 2022 -0400

        Improvements for guide inference in 'auto' mode

        In 'auto' mode, a putative guide sequence is selected at the site of maximal editing.  If the site of maximal editing happens near the end of the guide (e.g. base 0) many things will break (e.g. quantification windows, etc). This update excludes bases from being used to find the guide using the --exclude_bp_from_left and --exclude_bp_from_right parameters. At default, these parameters are 15bp, so the first and last 15bp would not be selected for the site of maximal editing and thus be the site of a guide sequence. In addition, the site of maximal editing must have 3x the magnitude over the background.

    commit 9d64de187835b2553ad2b4374d32edab27f83645
    Author: Kendell Clement <[email protected]>
    Date:   Thu Jun 2 20:22:25 2022 -0400

        Update README.md

    commit 6aafc5387986f5089ba55b68d128343d68052792
    Author: Simon P Shen <[email protected]>
    Date:   Tue May 31 17:42:53 2022 -0400

        directory in quotes in batch cmd (#222)

        Add quotes around output folder for folders that have spaces.

    commit 432f163ac68b9a650d1fd326171aadc505ee87f4
    Author: Kendell Clement <[email protected]>
    Date:   Tue May 24 23:38:36 2022 -0400

        CRISPRessoBatch fills NA values in batch settings

 …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants